# UCL AI Society Machine Learning Tutorials
### Session 01. Introduction to Numpy, Pandas and Matplotlib

### Contents
1. Numpy
2. Pandas
3. Matplotlib
4. EDA(Exploratory Data Analysis)

### Aim
At the end of this session, you will be able to:
- Understand the basics of numpy.
- Understand the basics of pandas.
- Understand the basics of matplotlib.
- Perform a simple EDA using libraries above.

## 3. Matplotlib
Matplotlib is a Python data visualisation library. Its plotting system is similar to that of MATLAB.

### 3.1 Basics of Matplotlib

In [None]:
# run this cell if you haven't installed matplotlib
!pip install matplotlib

- `%matplotlib inline` is only available for Jupyter Notebook and Jupyter QtConsole. With this backend, the output of your command will be displayed inline with frontends, directly below the code cell that produces it.
- `%matplotlib tk` is also only available for Jupyter Notebook. The output of your command will be displayed on a new broswer.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# %matplotlib tk

Plot a **y=sin(x)** graph.

In [None]:
# declare x and y 
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)
print(x)
print(y)
assert len(x)==len(y)

In [None]:
# plot a graph of sin(x)
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.grid()
plt.title('y = sin(x)')
plt.show()
# plt.savefig('./image/sinGraph.png')

Plot multiple functions: **y = x**, **y = x^2**, and **y = x^3** in one graph.

In [None]:
# TODO: plot multiple graphs in one graph
x = np.arange(10)
x_linear = None
x_square = None
x_cubic = None

plt.plot(x, None)
plt.plot(x, None)
plt.plot(x, None)
plt.xlabel('x axis')
plt.ylabel('y axis')
plt.grid()
plt.legend(['y = x', 'y = x^2', 'y = x^3'])
plt.title('y = x | y = x^2 | y = x^3')
# TODO: print out the graph under this cell

### 3.2 Various Types of Plots
Matplotlib library supports various types of graphs such as bar graph, histogram, scatter plot, area plot and pie plot. Let's use IMDB-Movie-Data again to get a better understanding of the data. Visualising data is a crucial part of EDA, which you'll get hands-on experience soon!

In [None]:
movie = pd.None("./data/IMDB-Movie-Data.csv")
movie.columns

Let's see if there is a positive correlation between `Rating` and `Revenue (Millions)` using `scatter` function. Parameter `s` controls the diameter / size of the scattered dots and `alpha` controls the degree of their transparency.

#### 3.2.1 Scatter plot

In [None]:
movie.plot.scatter(x = 'Rating', y = 'Revenue (Millions)', s = 10, alpha = 1)

They seem to have a bit of positive relationship.

#### 3.2.2 Bar plot
Let's look into the Most Used Programming, Scripting, and Markup Languages in 2018.  
Use `plt.bar` and `plt.xticks` to plot a 'language against Percentage' graph.

In [None]:
# https://insights.stackoverflow.com/survey/2018#most-popular-technologies
language = ["JS", "HTML", "CSS", "SQL", "Java", "Shell", "Python", "C#"]
percentage = [69.8, 68.5, 65.1, 57.0, 45.3, 39.8, 38.8, 34.4]

# Generating the y positions
y_positions = range(len(percentage))

# To Do: Create a bar plot
plt.bar(None, None)
plt.xticks(None, None)
plt.xlabel('language')
plt.ylabel('Percentage (%)')
plt.title("Most Used Programming, Scripting, and Markup Languages in 2018")
plt.show()

### 3.3 (Advanced) Matplotlib Exercise

In [None]:
# TODO: Draw 4 graphs in total in one figure
# Hint: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots.html
# You can see great example code from the url above

fig, axes = plt.None(None, None)  # this is where you need to use subplots

# TODO: Scatter Graph (Upper left)
x = np.random.randn(50)
y = np.random.randn(50)
colors = np.random.randint(0, 100, 50)
sizes = 500 * np.pi * np.random.rand(50) ** 2
axes[None, None].scatter(None)


# TODO: Bar Graph (Upper right)
x = np.arange(10)
axes[None, None].bar(x, x ** 2)

# TODO: Multi-Bar Graph (Lower left) -> Understand how it works!!
x = np.random.rand(3)
y = np.random.rand(3)
z = np.random.rand(3)
data = [x, y, z]

x_ax = np.arange(3)
for i in x_ax:
    axes[1, 0].bar(x_ax, data[i], bottom=np.sum(data[:i], axis=0))
axes[1, 0].set_xticks(x_ax)
axes[1, 0].set_xticklabels(['A', 'B', 'C'])

# TODO: Histogram Graph (Lower right)
data = np.random.randn(1000)
axes[None][None].hist(data, bins=40)

# TODO: Either show the image or save it to png file
None

### What to do next?
Below websites would be helpful for your further study on matplotlib:
- [DataCamp Matplotlib Tutorial: Python Plotting](https://www.datacamp.com/community/tutorials/matplotlib-tutorial-python)
- [Matplotlib official website](https://matplotlib.org/#)
- [Python Plotting With Matplotlib (Guide)](https://realpython.com/python-matplotlib-guide/)
- [Different plotting using pandas and matplotlib](https://www.geeksforgeeks.org/different-plotting-using-pandas-and-matplotlib/)
- [Matplotlib tutorial for beginner](https://github.com/rougier/matplotlib-tutorial)