## Welcome to the McMaster Artificial Intelligence Society's Intro To Python Libraries!
#### In this short tutorial, we will go over some useful introductory Python libraries for data science, as well as their basic implementations. <br> <br>The following libraries will be covered:

- JSON
- CSV
- tqdm
- glob
- numpy

### Let's begin by installing the necessary packages using the code below:<br>

In [None]:
%pip install --upgrade pip
%pip install numpy
%pip install glob3
%pip install tqdm

# If you have multiple versions of Python installed on your computer, you may need to run the following commands 
# instead:
#%pip3 install --upgrade pip
#%pip3 install glob3
#%pip3 install numpy
#%pip3 install tqdm

___
## JSON: Read/write JSON files and handle them as dictionaries in Python.<br>
#### Let's start by creating a sample dictionary item:

In [None]:
import json
keys = [k for k in range(1, 11)]
vals = [v for v in list('abcdefghij')]
d = dict(zip(keys, vals))
print(d)

#### Okay, now let's save it to our desktop:

In [None]:
outpath = '/Users/Victor/Desktop/sample.json'
with open(outpath, 'w') as outfile:
    json.dump(d, outfile)
outfile.close()

###### NOTE: We could have also saved a list of dictionaries to our JSON file!
___

#### We can view a JSON file in a standard text editor, or our IDE (Sublime, PyCharm, etc). Alternatively, we can install the program 'jq' on Linux/Mac/Windows, using Homebrew:

*brew install jq* <br> 
#### Then, simply add the following line to your .bash_profile:
###### alias json='function __json() { jq -C . $* | less -R; unset -f __json; }; __json'

### Great, now what about reading JSON files in Python? <br>

In [None]:
data_path = '/Users/Victor/Desktop/sample.json'
with open(data_path) as jsonfile:
    data = json.load(jsonfile)
jsonfile.close()
print(data)

#### We can print the keys and values:<br>

In [None]:
print(data.keys())
print('\n')
print(data.values())

#### And, we can add two dictionaries together:<br>

In [None]:
d2 = {'11': 'k'}
d.update(d2)
print(d)

___
## CSV: Read/write CSV files and handle them as lists in Python.<br>
#### Let's start by creating a list of string items:

In [None]:
my_list = [str(x) for x in range(1, 11)]
divided_list = [tuple(my_list[x:x+2]) for x in range(0, 9, 2)]
print(divided_list)

#### Now let's save this information to a CSV file, where each line is composed of one of the tuples, and each of the two numbers in a tuple is in a separate cell:

In [None]:
outpath = '/Users/Victor/Desktop/sample.csv'
with open(outpath, 'w') as outfile:
    for entry in divided_list:
        outfile.write(','.join(entry) + '\n')

#### Let's read our CSV file:

In [None]:
import csv
data_path = '/Users/Victor/Desktop/sample.csv'
with open(data_path) as csvfile:
    csvreader = csv.reader(csvfile)
    for entry in csvreader:
        print(entry)

___

## tqdm: A convenient progress bar tool to display the progress of a for loop.<br>
#### Let's see an example:

In [None]:
from tqdm import tqdm_notebook

my_list = [x for x in range(0, 25000000)]

for i in tqdm_notebook(my_list, total=len(my_list), desc="A progress bar!"):
    continue

### A few notes:
1. When working with a dictionary, set **total=len(list(my_dict.keys())**
2. When working outside of Jupyter, use **from tqdm import tqdm**, instead of **from tqdm import tqdm_notebook**

___
## glob: A package for working with multiple files. <br>
### Let's assume we're working with a folder containing multiple files:

In [None]:
from glob import glob

paths = glob('/Users/Victor/Desktop/sample_folder/*')
print(paths)

#### What if we have folders within folders?

In [None]:
paths = glob('/Users/Victor/Desktop/sample_folder/**/*', recursive=True)
print('\n'.join(paths))

#### And if we only wanted the CSV files?

In [None]:
paths = glob('/Users/Victor/Desktop/sample_folder/**/*.csv', recursive=True)
print('\n'.join(paths))

### Numpy: A package for working with large matrix data.<br>
#### Let's create an empty numpy array

In [None]:
import numpy as np

# For decimal values:
a = np.linspace(0., 1., 11)
print(a)

# For integer values:
a = np.arange(0, 10)
print(a)

#### We can also cast a list item to a numpy array:

In [None]:
my_list = [x for x in range(5, 15)]
b = np.array(my_list)
print(b)

#### Let's check the shape of our array:

In [None]:
print(b.shape)

#### We can add arrays together:

In [None]:
c = np.concatenate([a, b])
print(c)
print(c.shape)

#### If we wrap each array in a list, it will add them along the Y-axis, rather than the X-axis:

In [None]:
c = np.concatenate([[a], [b]])
print(c)
print(c.shape)

#### We can transpose the matrix:

In [None]:
print(c.T)

#### When working with large matricies, it is very slow to continually append an array to the end of your output array. It would be much faster if we initialized an array which matched the size and shape of our desired output array, and just changed its values along the way. We can do this with *np.zeros()*. For example:<br>

In [None]:
# Don't do this:
# output = np.array([x for x in range(1, 11)])
# for x in my_list_of arrays:
#     output.concatenate(x)

# Do this:

a = np.zeros([5, 10])
print(a, '\n')

l = [x for x in range(0, 50)]
split_list = [l[i:i + 10] for i in range(0, len(l), 10)]
print(split_list, '\n')

for i in range(0, len(split_list)):
    a[i] = split_list[i]
               
print(a)

#### We can save our numpy array:

In [None]:
outpath = '/Users/Victor/Desktop/a.npy'
np.save(outpath, a)

#### And we can load a numpy array:

In [None]:
datapath = '/Users/Victor/Desktop/a.npy'
a = np.load(datapath)