Skip to content

Commit

Permalink
DOC: Emphasize NumPy in Ecosystem openers
Browse files Browse the repository at this point in the history
Puts NumPy in the opening sentence of each Ecosystem section,
closing issue numpy#235.

Reduces text in the Visualization section, addressing a concern in numpy#238.

Closes #numpygh-235.
  • Loading branch information
bjnath committed May 19, 2020
1 parent 3dc709a commit a038c0a
Show file tree
Hide file tree
Showing 5 changed files with 84 additions and 106 deletions.
25 changes: 12 additions & 13 deletions layouts/partials/array-libraries.html
@@ -1,17 +1,16 @@
<!-- Array libraries Tab Content -->
<li class="array-libraries">
<p>
Numpy array forms the core of the organically growing numeric
Python <b>array library</b> ecosystem that now supports GPUs, sparse,
distributed arrays and more.
</p>
<p>
Several of these newer libraries such as CuPy, Sparse and Dask,
implement the NumPy API adding support for modern user cases,
newer hardware and higher scalability of array computing. Other
array libraries such as Xarray, TensorLy consume NumPy API and
build newer functionality on top of it, thus enhancing array
computing in Python beyond Numpy capabilities.
When libraries emerge to exploit new
hardware technologies and architectures, they take
NumPy as their starting point.
<a href="https://cupy.chainer.org">CuPy</a>,
<a href="https://sparse.pydata.org/en/latest/">Sparse</a>, and
<a href="https://dask.org/">Dask</a>
implement the NumPy API with support for modern user cases and
scalable hardware;
<a href="https://xarray.pydata.org/en/stable/index.html">Xarray</a> and
<a href="http://tensorly.org/stable/home.html">Tensor.ly</a> add newer functionality.
</p>
<table>
<tr class="highlight-th">
Expand Down Expand Up @@ -56,7 +55,7 @@
astronomy, satellite imagery and mobile network modeling.</td>
</tr>
<tr>
<td><img class="first-column-layout" src="images/content_images/arlib/CuPy.png" alt="CuPy"></td>
<td><img class="first-column-layout" src="images/content_images/arlib/cupy.png" alt="CuPy"></td>
<td class="full-center-text"><a href="https://cupy.chainer.org">CuPy</a></td>
<td class="left-text">NumPy-compatible matrix library accelerated by CUDA used to implement Neural Networks
for Deep Learning.</td>
Expand All @@ -82,7 +81,7 @@
</tr>
<tr>
<td><img class="first-column-layout" src="images/content_images/arlib/xtensor.png" alt="xtensor"></td>
<td class="full-center-text"><a href="" https://github.com/xtensor-stack/xtensor-python>xtensor </a> </td>
<td class="full-center-text"><a href="https://github.com/xtensor-stack/xtensor-python">xtensor</a> </td>
<td class="left-text">Multi-dimensional arrays with broadcasting and lazy computing for numerical
analysis.</td>
</tr>
Expand Down
63 changes: 40 additions & 23 deletions layouts/partials/data-science.html
Expand Up @@ -8,27 +8,44 @@
</div>
<div>
<p>
Data Science makes it possible to analyze massive amounts of data
and gain meaningful insights. A typical data science workflow involves
various techniques and tools such as:
NumPy lies at the core of a rich ecosystem of data science libraries.
</p>
<p>
Data science is the analysis of massive amounts of data
to gain insight. A typical workflow might be:

<ul class="content-tab">
<li><b>Extract, Transform, Load (ETL):</b> Pandas, Beautiful Soup, Intake</li>
<li><b>Explore:</b> Seaborn, Matplotlib</li>
<li><b>Model:</b> Scikit-learn, SciPy, statsmodels</li>
<li><b>Evaluate:</b> NumPy, TensorFlow </li>
<li><b>Extract, Transform, Load (ETL):</b>
<a href="https://pandas.pydata.org">Pandas</a>,
<a href="https://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a>,
<a href="https://intake.readthedocs.io/en/latest/"> Intake</a>
</li>

<li><b>Explore:</b>
<a href="https://seaborn.pydata.org"> Seaborn</a>,
<a href="https://matplotlib.org">Matplotlib</a>,

</li>

<li><b>Model:</b>
<a href="https://scikit-learn.org">scikit-learn</a>,
<a href="https://www.scipy.org">SciPy</a>,
<a href="https://www.statsmodels.org/stable/index.html"> statsmodels</a>.
</li>

<li><b>Evaluate:</b>
NumPy,
<a href="https://www.tensorflow.org">TensorFlow</a>
</li>

<li>
<b>Presentation:</b>
<b>Display:</b>
<a href="./index.html/#tab-visual"> Data Visualization Tools</a>
</li>
</ul>
</p>
</div>
</div>
<p>
Python has a rich ecosystem of libraries that enable Data Science
workflows. <b> NumPy</b> is the foundation of almost all of these tools
such as Pandas, Seaborn, Beautiful Soup and several others.
</p>
<div class="grid-container">
<div>
<p>
Expand All @@ -37,13 +54,13 @@
data access and distribution, while
<a href="https://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a>
is widely used for web-scraping and gathering data sets.
<a href="https://seaborn.pydata.org"> Seaborn</a> is well known for its
<a href="https://towardsdatascience.com/how-to-perform-exploratory-data-analysis-with-seaborn-97e3413e841d">exploratory data analysis (EDA)</a>
capabilities, <a href="https://scikit-learn.org">Scikit-learn</a> and
<a href="https://www.scipy.org">Scipy</a> (statistical computing) serve some
<a href="https://seaborn.pydata.org"> Seaborn</a> is well known for
<a href="https://towardsdatascience.com/how-to-perform-exploratory-data-analysis-with-seaborn-97e3413e841d">exploratory data analysis (EDA)</a>;
<a href="https://scikit-learn.org">scikit-learn</a> and
<a href="https://www.scipy.org">SciPy</a> (statistical computing) serve some
of the backbone processes required for machine learning (regression methods,
classification, clustering, model validation and selection).
Statistical data exploration, estimation of various statistical models
Statistical data exploration, estimation of various statistical models,
and conducting statistical tests are some of the functions offered by
<a href="https://www.statsmodels.org/stable/index.html"> statsmodels</a>.
</p>
Expand All @@ -53,11 +70,11 @@
</div>
</div>
<p>
Effective data analytics require deep knowledge of the data domain (e.g.,
Retail, Healthcare, Marketing, Finance, Social Media, Automation, Sales, Travel,
etc.) as well as other core disciplines of Data Science, Data Engineering and
Data Visualization. Tools such as <a href="https://mlflow.org">MLFlow</a> address
experiment hyper-parameter and result tracking needs, while
Effective data analytics requires deep knowledge of the data domain (e.g.,
retail, healthcare, marketing, finance, social media, automation, sales, travel,
etc.) as well as other core disciplines of data science, data engineering, and
data visualization. Tools such as <a href="https://mlflow.org">MLFlow</a> address
experiment hyperparameter and result tracking needs, while
<a href="https://dvc.org"> DVC</a> provides data version control for data science
and machine learning workflows.
</p>
Expand Down
48 changes: 20 additions & 28 deletions layouts/partials/machine-learning.html
Expand Up @@ -13,29 +13,21 @@
</div>
<div>
<p>
<b>Machine learning</b> (ML) enables computers to learn using
data, without having to be explicitly programmed.
<b>NumPy</b> is the foundation of all data pre-processing
that happens in the implementation of several ML Algorithms.
</p>
<p>
Python’s rich machine language and deep learning ecosystem
provides powerful tools such as
<a href="https://scikit-learn.org/stable/">Scikit-learn</a>
that is built on top of NumPy and
<a href="https://www.scipy.org">SciPy</a> and offers data
mining and analytics using classical ML algorithms.
</p>
<p>
<a href="https://www.tensorflow.org">Tensorflow’s</a>
deep learning capabilities help to define and run
computations involving tensors that have broad
applications in Speech and image recognition, Text-based
applications, Time-Series analysis and Video Detection.
<a href="https://pytorch.org">PyTorch </a> is another deep
learning library that is very popular among researchers for
computer vision and NLP applications. <a href="https://github.com/apache/incubator-mxnet">MXNet</a>
is another AI package that provides blueprints and
NumPy forms the basis of powerful machine learning libraries
like
<a href="https://scikit-learn.org">scikit-learn</a> and
<a href="https://www.scipy.org">SciPy</a>.
As machine learning grows, so does the
list of libraries built on NumPy.
<a href="https://www.tensorflow.org">TensorFlow’s</a>
deep learning capabilities have broad
applications &mdash; among them speech and image recognition, text-based
applications, time-series analysis, and video detection.
<a href="https://pytorch.org">PyTorch</a>, another deep
learning library, is popular among researchers in
computer vision and natural language processing.
<a href="https://github.com/apache/incubator-mxnet">MXNet</a>
is another AI package, providing blueprints and
templates for deep learning.
</p>
</div>
Expand All @@ -44,15 +36,15 @@
<div>
<p>
Statistical techniques called
<a href="https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205">Ensemble</a>
<a href="https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205">ensemble</a>
methods such as binning,
bagging, stacking and boosting are widely used in various ML
bagging, stacking, and boosting are among the ML
algorithms implemented by tools such as
<a href="https://github.com/dmlc/xgboost">XGBoost</a>,
<a href="https://lightgbm.readthedocs.io/en/latest/">LightGBM</a>,
<a href="https://catboost.ai">CatBoost</a> - one of the
<a href="https://lightgbm.readthedocs.io/en/latest/">LightGBM</a>, and
<a href="https://catboost.ai">CatBoost</a> &mdash; one of the
fastest inference engines.
<a href="https://www.scikit-yb.org/en/latest/">Yellowbrick</a>,
<a href="https://www.scikit-yb.org/en/latest/">Yellowbrick</a> and
<a href="https://eli5.readthedocs.io/en/latest/">Eli5</a>
offer machine learning visualizations.
</p>
Expand Down
21 changes: 7 additions & 14 deletions layouts/partials/scientific-domains.html
@@ -1,14 +1,12 @@
<!-- Scientific Domains Tab Content -->
<li class="scientific-domains">
<p>
Data acquisition (experimental, simulation), processing and
visualization are the core data related tasks in almost all the
scientific domains. Visualization of results through high quality
figures make scientific reports and publications easy to
understand. Python is easier to learn and computationally
efficient for scientific computing. NumPy, SciPy and Matplotlib
form the core Python packages that are used across various
scientific domains.
Nearly every scientist working in Python draws on the power of NumPy.
</p>
<p>
NumPy brings the computational power of languages like C and Fortran
to Python, a language much easier to learn and use. With this power
comes simplicity: a solution in NumPy is often clear and elegant.
</p>
<!-- First Row -->
<table>
Expand Down Expand Up @@ -99,7 +97,7 @@
<td class="center-text"></td>
<td class="center-text"></td>
<td class="center-text"><a
href="https://towardsdatascience.com/easy-steps-to-plot-geographic-data-on-a-map-python-11217859a2db">NumPy</a>
href="https://towardsdatascience.com/easy-steps-to-plot-geographic-data-on-a-map-python-11217859a2db">NumPy</a>
</td>
<td class="center-text"></td>
</tr>
Expand All @@ -122,9 +120,4 @@
<td class="lastrow-center-text"></td>
</tr>
</table>
<p>
NumPy’s powerful array processing capabilities and elegant syntax
helps to clearly and efficiently express computational algorithms
in various scientific computing domains.
</p>
</li>
33 changes: 5 additions & 28 deletions layouts/partials/visualization.html
Expand Up @@ -54,47 +54,24 @@
</div>
<div>
<p>
<a href="https://www.slideshare.net/Visage/data-visualization-101-how-to-design-chartsandgraphs">Data
Visualization</a>
exposes patterns, trends and correlations in textual-data, making it easier for humans to
analyse and interpret large volumes of data.
</p>
<p>
<a href="https://python-graph-gallery.com">Visualization elements</a>
such as bar graphs, pie charts, line charts, maps, infographics, dashboards,
geographic maps, heatmaps, and interactive images offer valuable
insights for making data-driven decisions.
</p>
</div>
<div>
<p>
NumPy is the key data transformation building block for the burgeoning
<a href="https://pyviz.org/overviews/index.html">Python visualization landscape</a> comprising of
NumPy transforms the data in the burgeoning
<a href="https://pyviz.org/overviews/index.html">Python visualization landscape</a>, which includes
<a href="https://matplotlib.org">Matplotlib</a>,
<a href="https://seaborn.pydata.org">Seaborn</a>,
<a href="https://plot.ly">Plotly</a>,
<a href="https://altair-viz.github.io">Altair</a>,
<a href="https://docs.bokeh.org/en/latest/">Bokeh</a>,
<a href="http://holoviz.org">Holoviz</a>,
<a href="http://vispy.org">Vispy</a> and
<a href="http://vispy.org">Vispy</a>, and
<a href="https://github.com/napari/napari">Napari</a>,
to name a few.
</p>
<p>
By performing parallel operations on large arrays, all at once, NumPy accelerates data-processing and
visualization of large quantities of data, beyond Python's native performance levels, for data
visualization at
scale.
NumPy's accelerated processing of large arrays allows researchers to visualize
datasets far larger than native Python could handle.
</p>
</div>
<div>
<p>
<a href="https://rougier.github.io/python-visualization-landscape/landscape-colors.png">
<img src="images/content_images/vis-landscape.png"
alt="Mindmap linking several concepts, such as Javascript, Matplotlib, d3js and OpenGL."
align="left">
</a>
</p>
</div>
</div>
</li>

0 comments on commit a038c0a

Please sign in to comment.