Skip to content
Permalink
Browse files
Remove "verizon media", keep Yahoo.
  • Loading branch information
leerho committed Nov 17, 2021
1 parent c091281 commit 07a921775057d072bd7264fa45681db20c3d408a
Showing 5 changed files with 7 additions and 7 deletions.
@@ -25,7 +25,7 @@ layout: doc_page
| Title | Data Sketching for Real Time Analytics: Theory and Practice |
| --------- | ---------------------------------------------------------------------------- |
| Synopsis | Explanation of the theory and operation of this new sketch. |
| Presenter | Daniel Ting (Tableau Software), Jonathan Malkin (Verizon), Lee Rhodes (Verizon) |
| Presenter | Daniel Ting (Tableau Software), Jonathan Malkin (Yahoo), Lee Rhodes (Yahoo) |
| Date | Aug 23, 2020 |
| Conference| [KDD 2020](https://www.kdd.org/kdd2020/tutorials/lecture-tutorials) |
| Link | <https://datasketches.apache.org/docs/Community/KDD_Tutorial_Summary.html> |
@@ -43,7 +43,7 @@ layout: doc_page
| Title | A Production Quality Sketching Library for the Analysis of Big Data |
| --------- | ---------------------------------------------------------------------------- |
| Synopsis | Quick review of Count Distinct, Quantiles, Frequent Items and other sketches. |
| Presenter | [Lee Rhodes](https://www.linkedin.com/in/leerho/) from [Verizon Media (Yahoo)](https://www.verizonmedia.com/) |
| Presenter | [Lee Rhodes](https://www.linkedin.com/in/leerho/) from [Yahoo](https://www.verizonmedia.com/) |
| Date | June 15, 2020 |
| Conference| [Spark+AI](https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA) |
| Link | [YouTube](https://www.youtube.com/watch?time_continue=5&v=WPwCnswDbOU) |
@@ -119,11 +119,11 @@ The audience should learn about

Daniel Ting is a researcher in Tableau working primarily on data sketching with sketching work published in KDD, SIGMOD, and NeurIPS. His work in the area was initially inspired by problems he encountered while on Facebook's core data science team where he built systems for large scale online experimentation. He obtained his PhD in Statistics from U.C. Berkeley.

Jon Malkin is a senior principal research engineer at Verizon Media and a contributor to the Apache DataSketches project. He has experience with large-scale data processing, both brute-force and with sketches, from roles in computational advertising and website traffic analytics. He obtained his PhD in Electrical Engineering from the University of Washington.
Jon Malkin is a senior principal research engineer at Yahoo and a contributor to the Apache DataSketches project. He has experience with large-scale data processing, both brute-force and with sketches, from roles in computational advertising and website traffic analytics. He obtained his PhD in Electrical Engineering from the University of Washington.

## Contributor Bio

Lee Rhodes is a Distinguished Architect at Yahoo (now Verizon Media). He created the DataSketches project in 2012 to address analysis problems in Yahoo's large data processing pipelines. DataSketches was Open Sourced in 2015 and is now a top level project in the Apache Software Foundation. He was an author or coauthor on sketching work published in ICDT, IMC, and JCGS. He obtained his Master's Degree in Electrical Engineering from Stanford University.
Lee Rhodes is a Distinguished Architect at Yahoo. He created the DataSketches project in 2012 to address analysis problems in Yahoo's large data processing pipelines. DataSketches was Open Sourced in 2015 and is now a top level project in the Apache Software Foundation. He was an author or coauthor on sketching work published in ICDT, IMC, and JCGS. He obtained his Master's Degree in Electrical Engineering from Stanford University.

## Societal Impact

@@ -45,7 +45,7 @@ This library has been designed from the beginning to be high-performance and pro
The library is written in Java and C++ (with adaptors to Python), and contains state of the art algorithms for a variety of basic query classes, including identifying frequent items, unique count queries, computing quantiles and histograms, and sampling. It will soon contain algorithms for matrix analytic tasks such as PCA as well.
All algorithms in the library produce mergeable summaries, and come with formal guarantees on the accuracy of the answers returned.

The original core contributors to the library are Lee Rhodes, Jon Malkin, and Alex Saydakov (all at Yahoo/Verizon Media), Justin Thaler (Assistant Professor at Georgetown University, Department of Computer Science), and Edo Liberty (Principal Scientist at Amazon Web Services and manager of the Algorithms group at Amazon AI), but we continue to grow our community.
The original core contributors to the library are Lee Rhodes, Jon Malkin, and Alex Saydakov (all at Yahoo), Justin Thaler (Assistant Professor at Georgetown University, Department of Computer Science), and Edo Liberty (Principal Scientist at Amazon Web Services and manager of the Algorithms group at Amazon AI), but we continue to grow our community.

The library has been adapted throughout industry and government. For example, at Yahoo, where it was conceived and created, the library is widely used internally to reduce processing time from days to seconds for many tasks. At SpliceMachine, it is used for database query planning and optimization. It is also deeply embedded into a low-latency open source data store called Druid, as well as an open source graph database called Gaffer that is maintained by the British intelligence agency GCHQ. We have recently created an integration for PostgreSQL, and an integration into HP Vertica is being developed.

@@ -292,7 +292,7 @@ that affect the project are also documented on that channel.
> The project is independent from any corporate or organizational influence.
#### Yes.
* Our project has committers and contributors from Verizon Media, Inc.; Hypercube, Inc.; Permutive, Inc. UK;
* Our project has committers and contributors from Yahoo, Inc.; Hypercube, Inc.; Permutive, Inc. UK;
Tableau (Salesforce, Inc.); Georgetown University, Washington, D.C.; Warwick University, UK;
UC Berkeley; Apache Druid, and other researchers and engineers from around the world.

@@ -46,7 +46,7 @@ layout: front_page

<p>If approximate results are acceptable, there is a class of specialized algorithms, called streaming algorithms, or <a href="/docs/Background/SketchOrigins.html">sketches</a> that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of real-time analysis, sketches are the only known solution.</p>

<p>For any system that needs to extract useful information from big data these sketches are a required toolkit that should be tightly integrated into their analysis capabilities. This technology has helped Yahoo (Verizon Media) successfully reduce data processing times from days or hours to minutes or seconds on a number of its internal platforms.</p>
<p>For any system that needs to extract useful information from big data these sketches are a required toolkit that should be tightly integrated into their analysis capabilities. This technology has helped Yahoo successfully reduce data processing times from days or hours to minutes or seconds on a number of its internal platforms.</p>

<p>This project is dedicated to providing a broad selection of sketch algorithms of production quality. Contributions are welcome from those interested in further development of this science and art.</p>
</div>

0 comments on commit 07a9217

Please sign in to comment.