## Getting Started: How to set up the Lab Environment

### Preparing your Data Science Workstation

In the course of this project, you are going to go through all the steps of a real deep learning workflow, including: 

- data exploration
- feature engineering
- network architecture  
- model training
- model evaluation and tuning

Your data science workstation will be your main workplace. Let's get it set up.

**Notebook Environment**

In the following you will set up a deep learning environment locally or in the cloud. We recommended one of two options:
- run JupyerLab notebooks and install project dependencies locally 
- use [_Google Colab_](https://colab.research.google.com/) cloud notebooks

You may choose the environment you are most familar with. Another point to consider is whether you have a suitable GPU in your local computer. While not strictly necessary, running TensorFlow code on the GPU can significantly accelerate your model training and thereby your model engineering workflow. Google Colab provides  access to GPU-backed notebooks.


**Task**: _Choose a notebook environment an familiarize yourself with it, if needed._

**Dependencies**

Your workflow is going to rely on a number of open-source Python libraries that need to be installed in your environment. Among them are standard libraries such as:

- [`numpy`](https://numpy.org)
- [`scipy`](https://scipy.org)
- [`pandas`](https://pandas.pydata.org)
- [`matplotlib`](https://matplotlib.org)
- [`librosa`](https://librosa.org)
- [`tensorflow`](https://www.tensorflow.org)

In addition to that, we are going to use: 

- [`python_speech_features`](https://github.com/jameslyons/python_speech_features/blob/master/example.py)

You have several options for installing and managing these dependencies.
- To set up locally, manage packages with [conda](https://docs.conda.io/en/latest/) or [pip](https://pypi.org/project/pip/)
- A standard Colab notebook comes with the standard data science libraries preinstalled. You can run `pip` to install missing ones directly from the notebook.
- For those of you are new to pip package manager, please review these material. This [video](https://livevideo.manning.com/module/20_1_11/machine-learning-for-mere-mortals/the-basics/install-python-tools) shows you how to install pip. ?



**Task**: _Prepare a Python environment with the required dependencies._

### Data Source

> At Google, we’re often asked how to get started using deep learning for speech and other audio recognition problems, like detecting keywords or commands. [...] Perhaps more importantly, there aren’t many free and openly available datasets ready to be used for a beginner’s tutorial (many require preprocessing before a neural network model can be built on them) or that are well suited for simple keyword detection. 

-- [Google AI Blog](https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html)

The speech command dataset curated by [Google AI](https://ai.google) is an excellent starting point for diving into machine learning from audio. It is a labelled set of short audio clips containing one spoken word each. The clips are labelled, with roughly 2000 examples per word - a good foundation for training a robust model.

Make sure you have hard disk space for the roughly 8 GB of data - locally or in Google Drive, depending on your choice of notebook environment.

**Task**: _Acquire the data._

1. [_download the speech commands dataset from here_](https://lp-prod-resources.s3.us-west-2.amazonaws.com/234/google_speech_new.zip)
2. _decompress the archive to a suitable location and set the `data_path` variable_

On Colab, you can acces files from your Google Drive by mounting it. This is done by running the following code: 

```
from google.colab import drive
drive.mount('/content/drive')
```

In [1]:
from pathlib import Path

In [2]:
current_dir = Path('.')
data_path = current_dir / 'google_speech'

In [3]:
print(*data_path.iterdir(), sep="\n")
#ls {data_path}

google_speech\background
google_speech\test
google_speech\train
google_speech\validation


Examine the _training data folder_:

In [4]:
print(*(data_path / "train").iterdir(), sep="\n")

google_speech\train\.DS_Store
google_speech\train\audio
google_speech\train\train.csv


**Task**: _Use `pandas` to read and examine the .csv file._

In [5]:
import pandas

In [6]:
train_data = pandas.read_csv(data_path / "train" / "train.csv")
train_data

Unnamed: 0,file_path,label,file_name
0,bed/00f0204f_nohash_0.wav,bed,00f0204f_nohash_0.wav
1,bed/00f0204f_nohash_1.wav,bed,00f0204f_nohash_1.wav
2,bed/0a7c2a8d_nohash_0.wav,bed,0a7c2a8d_nohash_0.wav
3,bed/0b09edd3_nohash_0.wav,bed,0b09edd3_nohash_0.wav
4,bed/0b56bcfe_nohash_0.wav,bed,0b56bcfe_nohash_0.wav
...,...,...,...
51083,zero/ffd2ba2f_nohash_1.wav,zero,ffd2ba2f_nohash_1.wav
51084,zero/ffd2ba2f_nohash_2.wav,zero,ffd2ba2f_nohash_2.wav
51085,zero/ffd2ba2f_nohash_3.wav,zero,ffd2ba2f_nohash_3.wav
51086,zero/ffd2ba2f_nohash_4.wav,zero,ffd2ba2f_nohash_4.wav


Each entry in the `file_path` column points to an audio clip in the _training audio folder_. 

In [7]:
train_audio_path = data_path / 'train' / 'audio'

You can easily load and display a single audio file in a notebook. Try this out to verify that you can access the data.

In [8]:
from IPython.display import Audio

In [9]:
example_audio_path = train_audio_path / 'zero' / 'ffd2ba2f_nohash_1.wav'
Audio(example_audio_path)

You are now all set to get started with the first milestone - exploring the data.