# Week 1: Day 1 AM // Introduction: Data Science Toolbox

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components:

**Data exploration & analysis.**

- Included here: Pandas; NumPy; SciPy; a helping hand from Python’s Standard Library.

**Data visualization.** A pretty self-explanatory name. Taking data and turning it into something colorful.

- Included here: Matplotlib; Seaborn; Plotly; others.

**Classical machine learning**. Conceptually, we could define this as any supervised or unsupervised learning task that is not deep learning (see below). Scikit-learn is far-and-away the go-to tool for implementing classification, regression, clustering, and dimensionality reduction, while StatsModels is less actively developed but still has a number of useful features.

- Included here: Scikit-Learn, StatsModels.

**Deep learning**. This is a subset of machine learning that is seeing a renaissance, and is commonly implemented with Keras, among other libraries. It has seen monumental improvements over the last ~5 years, such as AlexNet in 2012, which was the first design to incorporate consecutive convolutional layers.

- Included here: Keras and TensorFlow.

**Data storage and big data frameworks.** Big data is best defined as data that is either literally too large to reside on a single machine, or can’t be processed in the absence of a distributed environment. The Python bindings to Apache technologies play heavily here.

- Apache Spark; Apache Hadoop; HDFS; Dask; h5py/pytables.

**Odds and ends.** Includes subtopics such as natural language processing, and image manipulation with libraries such as OpenCV.

- Included here: nltk; Spacy; OpenCV/cv2; scikit-image; Cython.

## Anaconda Basic

Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. The distribution includes data-science packages suitable for Windows, Linux, and macOS.

Download Anaconda from [here](https://www.anaconda.com/products/individual).



### Managing Conda

Verify that conda is installed and running on your system by typing:

`conda --version`

Conda displays the number of the version that you have installed. You do not need to navigate to the Anaconda directory.

EXAMPLE: `conda 4.7.12`

Update conda to the current version. Type the following:

`conda update conda`

Conda compares versions and then displays what is available to install.

If a newer version of conda is available, type y to update:

`Proceed ([y]/n)? y`

### Managing Environtment

Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments.

When you begin using conda, you already have a default environment named base. You don't want to put programs into your base environment, though. Create separate environments to keep your programs isolated from each other.

Create a new environment and install a package in it.

We will name the environment `Hacktiv8` and install the package pandas. At the Anaconda Prompt or in your terminal window, type the following:

`conda create --name Hacktiv8 pandas`

Conda checks to see what additional packages ("dependencies") pandas will need, and asks if you want to proceed:

`Proceed (y]/n)? y`

Type "y" and press Enter to proceed.

To use, or "activate" the new environment, type the following:

```sh
Windows: conda activate Hacktiv8

macOS and Linux: conda activate Hacktiv8
```

Now that you are in your Hacktiv8 environment, any conda commands you type will go to that environment until you deactivate it.

To see a list of all your environments, type:

`conda info --envs`

A list of environments appears, similar to the following:

```sh
conda environments:

    base           /home/username/Anaconda3
    snowflakes   * /home/username/Anaconda3/envs/snowflakes
```

The active environment is the one with an asterisk (*).

Change your current environment back to the default (base): `conda activate`

### Managing Python

When you create a new environment, conda installs the same Python version you used when you downloaded and installed Anaconda. If you want to use a different version of Python, for example Python 3.6, simply create a new environment and specify the version of Python that you want.

Create a new environment named "fox" that contains Python 3.6:

`conda create --name fox python=3.6`

When conda asks if you want to proceed, type "y" and press Enter.

### Managing Packages

In this section, you check which packages you have installed, check which are available and look for a specific package and install it.

To find a package you have already installed, first activate the environment you want to search. Look above for the commands to activate your Hacktiv8 environment.

Check to see if a package you have not installed named "beautifulsoup4" is available from the Anaconda repository (must be connected to the Internet):

`conda search beautifulsoup4`

Conda displays a list of all packages with that name on the Anaconda repository, so we know it is available.

Install this package into the current environment:

`conda install beautifulsoup4`

Check to see if the newly installed program is in this environment:

`conda list`

## The Notebooks

### JupyterLab

JupyterLab is a web-based interactive development environment for Jupyter notebooks, code, and data. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning. JupyterLab is extensible and modular: write plugins that add new components and integrate with existing ones.

### Google Colab

Colaboratory, or "Colab" for short, allows you to write and execute Python in your browser, with
- Zero configuration required
- Free access to GPUs
- Easy sharing
Whether you're a student, a data scientist or an AI researcher, Colab can make your work easier.
