### Installation in Anaconda: pip install matplotlib

[Koristan tutorijal](https://matplotlib.org/stable/index.html)

In [None]:
import matplotlib.pyplot as plt

### Load data

In [None]:
import pandas as pd

In [None]:
download_url = (
   ...:     'https://raw.githubusercontent.com/fivethirtyeight/'
   ...:     'data/master/college-majors/recent-grads.csv'
   ...: )

df = pd.read_csv(download_url)

In [None]:
type(df)

### About data

In [None]:
df.info()

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.describe()

### Important columns:
* 'Median' is the median earnings of full-time, year-round workers.
* 'P25th' is the 25th percentile of earnings.
* 'P75th' is the 75th percentile of earnings.
* 'Rank' is the major’s rank (rank of each institution)

### First graph

In [None]:
# The %matplotlib magic command sets up your Jupyter Notebook for displaying plots with Matplotlib 
%matplotlib

In [None]:
df.plot(x='Rank', y=['P25th', 'Median', 'P75th'])

In [None]:
plt.show()

In [None]:
%matplotlib inline # display plots in notebook

In [None]:
df.plot(x='Rank', y=['P25th', 'Median', 'P75th'])

### Plot kind:

The default value is 'line'.

* 'area' is for area plots.
* 'bar' is for vertical bar charts.
* 'barh' is for horizontal bar charts.
* 'box' is for box plots.
* 'hexbin' is for hexbin plots.
* 'hist' is for histograms.
* 'kde' is for kernel density estimate charts.
* 'density' is an alias for 'kde'.
* 'line' is for line graphs.
* 'pie' is for pie charts.
* 'scatter' is for scatter plots.

In [None]:
df.plot(x='Rank', y=['P25th', 'Median', 'P75th'], kind='bar')

### Equivalent way

In [None]:
# For DataFrame

In [None]:
df.plot.bar(x='Rank', y=['P25th', 'Median', 'P75th'])

In [None]:
# Како ради??

In [None]:
plt.plot(df['Rank'], df['P75th'])

In [None]:
# pandas .plot() is a wrapper for pyplot.plot()
# and the result is a graph identical to the one you produced with Matplotlib

[Wrapper functions](https://www.pythonpool.com/python-wrappers/#:~:text=Trending%20Python%20Articles-,What%20are%20Wrappers%20in%20Python%3F,are%20also%20known%20as%20decorators.)

In [None]:
df.plot(x='Rank', y='P75th')

In [None]:
df.plot??

In [None]:
# For Series

In [None]:
s = df['Median']

In [None]:
s

In [None]:
type(s)

In [None]:
s.plot(kind='hist')

In [None]:
plt.hist(s)

## Data analysis using matplotlib

### Detect outliers

Незванична дефиниција: <br> Статистички аутлајер (одударајући податак) је било која тачка у скупу података која је ван предефинисане расподеле података. Обично представља абнормалне вредности података који би требало да буду изузете, било да је због нетачне вредности или пак јаког утицаја при анализи података.

In [None]:
df_outliers = df.loc[df['Median']>70000]

In [None]:
df_outliers.shape

In [None]:
df_outliers.plot(x='Major', y='Median', kind='bar', rot=5, fontsize=4)

In [None]:
df_outliers['Major']

### Check correlation

In [None]:
df.plot(x='Median', y='Unemployment_rate', kind='scatter')

In [None]:
df.corr()  # Pearson correlation

$$
  r =
  \frac{ \sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y}) }{%
        \sqrt{\sum_{i=1}^{n}(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^{n}(y_i-\bar{y})^2}}
$$

In [None]:
corr_matrix = df.corr()

In [None]:
corr_matrix.style.background_gradient(cmap='coolwarm')

In [None]:
import seaborn as sn

[Seaborn vs Matplotlib](https://www.geeksforgeeks.org/difference-between-matplotlib-vs-seaborn/#:~:text=Seaborn%20is%20more%20comfortable%20in,provide%20beautiful%20graphics%20in%20python.&text=Matplotlib%20works%20efficiently%20with%20data,various%20stateful%20APIs%20for%20plotting.)

In [None]:
sn.heatmap(corr_matrix)

### Group categorical data

In [None]:
df['Major_category'].unique()

In [None]:
major_totals = df.groupby('Major_category')['Total'].sum().sort_values()

In [None]:
major_totals

In [None]:
len(major_totals)

In [None]:
major_totals.plot(kind='barh', fontsize=4)

In [None]:
# Plot only Engineering

In [None]:
df_Engineering = df[df['Major_category']=='Engineering']

In [None]:
df_Engineering['Total'].plot(kind='hist')

In [None]:
df_Engineering.head()

[Dummy variables](https://www.geeksforgeeks.org/how-to-create-dummy-variables-in-python-with-pandas/)

In [None]:
pd.get_dummies(df['Major_category'])