<a href="https://colab.research.google.com/github/Phorutai/ICMA252/blob/main/Notebook_07_Data_Visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **What is Matplotlib?**
- Matplotlib is a low level graph plotting library in python that serves as a visualization utility.
- The source code for Matplotlib is located at
[this github repository](https://github.com/matplotlib/matplotlib)

In [None]:
# You need to import the matplotlib.pyplot module and run the code cell before using it.
import matplotlib.pyplot as plt

# **1. Line plot**

In [None]:
x = [1, 2, 3, 4, 5, 6]
y = [4, 10, 6, 16, 26, 20]
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set(xlabel='x', ylabel='y', title='line graph')
plt.savefig('graph1.png')
plt.show()

# **2. Bar chart**

In [None]:
x = ['A', 'B', 'C', 'D', 'E', 'F']
y = [4, 10, 6, 16, 26, 20]
fig, ax = plt.subplots()
ax.bar(x, y)
ax.set(xlabel='x', ylabel='y', title='Bar graph')
plt.savefig('graph2.png')
plt.show()

Bar chart with error bars

In [None]:
x = ['A', 'B', 'C', 'D', 'E', 'F']
y = [4, 10, 6, 16, 26, 20]
z = [2, 3, 1.5, 5, 12.5, 10]
fig, ax = plt.subplots()
ax.bar(x, y, yerr=z)
ax.set(xlabel='x', ylabel='y', title='Bar graph')
plt.savefig('graph2-1.png')
plt.show()

# **3. Scatter plot**

In [None]:
x = [1, 2, 3, 4, 5, 6]
y = [4, 10, 16, 24, 32, 40]
fig, ax = plt.subplots()
ax.scatter(x, y)
ax.set(xlabel='x', ylabel='y', title='Scatter plot')
plt.savefig('graph3.png')
plt.show()


# **4. Pie chart**

In [None]:
labels = ['A', 'B', 'C', 'D']
y = [4, 10, 24, 32]
fig, ax = plt.subplots()
ax.pie(y, labels=labels)
plt.title('Pie chart')
plt.savefig('graph4.png')
plt.show()

# **5. Histogram**

In [None]:
import numpy as np
data = np.random.normal(loc=3, scale=2, size=100)  # generate data from random numbers
fig, ax = plt.subplots()
ax.hist(data, bins=10)
# bins is number of class intervals
ax.set(xlabel='Ages', ylabel='Frequency', title='Histogram')
plt.savefig('graph5.png')
plt.show()

# **6. Box and whisker plot**

One box plot

In [None]:
x1 = [25, 28, 29, 29, 30, 34, 35, 35, 37, 38]
fig, ax = plt.subplots()
ax.boxplot(x1)
ax.set(xlabel='x', ylabel='y', title='Box plot')
plt.savefig('graph6-1.png')
plt.show()

A side-by-side box plot

In [None]:
x1 = [25, 28, 29, 29, 30, 34, 35, 35, 37, 38]
x2 = [20, 22, 29, 30, 32, 28, 29, 31, 27, 28]
fig, ax = plt.subplots()
ax.boxplot([x1, x2])
ax.set(xlabel='x', ylabel='y', title='Box plot')
plt.xticks([1,2], labels=['x1','x2'])
plt.savefig('graph6-2.png')
plt.show()

# **Overlaying of multiple data sets in one plot**

In [None]:
x = [1, 2, 3, 4, 5, 6]
y1 = [4, 10, 6, 16, 26, 20]
y2 = [4, 10, 6, 16, 26, 20]
fig, ax = plt.subplots()
ax.bar(x, y1, label='y1')  # plot for y1
ax.plot(x, y2, label='y2', color='red')  # plot for y2
ax.set(xlabel='x', ylabel='y', title='overlayed graph')
plt.legend()  # show legends
plt.savefig('graph7.png')
plt.show()

**Exercise 1:** Use  the data from the link below to do the following tasks

**data:** https://raw.githubusercontent.com/Phorutai/deposited_csv_data/refs/heads/main/StudentsPerformance.csv

1. Find mean and stadard deviation (std) of score from male and female students
2. Create a bar plot to visualised mean and std values of score from male and female students
3. Find mean and std values of score from four ethnic groups (A, B, C, and D)
4. Create a bar plot to visualised mean and std values of score from the four ethinc group.
5. Visualize the distibution of scores from four ethnic groups using a side box plot.
6. Creat an overlaying histogram to show the score distribution between male and female students

In [None]:
# import needed modules
import pandas as pd
import matplotlib.pyplot as plt
# read a data from a csv file
df = pd.read_csv('https://raw.githubusercontent.com/Phorutai/deposited_csv_data/refs/heads/main/StudentsPerformance.csv')
df

In [None]:
"""
1. Find mean and stadard deviation (std) of score from male and female students
"""
df_male = df[df['gender'] == 'male']
df_female = df[df['gender'] == 'female']
male_stat = df_male['score'].describe()
female_stat = df_female['score'].describe()
mean_score = [male_stat['mean'], female_stat['mean']]
std_score = [male_stat['std'], female_stat['std']]

In [None]:
"""
2. Create a bar plot to visualised mean and std values of score from male and female students
"""
fig, ax = plt.subplots()
x = ['male', 'female']
y = mean_score
error_bar = std_score
ax.bar(x,y,yerr=error_bar)
plt.show()

In [None]:
# wirte code here. Feel free to add more code cell

# **What is Seaborn?**
- Seaborn is an amazing visualization library for statistical graphics plotting in Python.
- It provides beautiful default styles and color palettes to make statistical plots more attractive.
- It is built on top matplotlib library and is also closely integrated with the data structures from pandas.
- The source code for Seaborn is located at
[this github repository](https://github.com/mwaskom/seaborn)

Use the same data set as Exercise 1 to visualize data using Seaborn
1. Create a bar plot to visualised mean and std values of score from the four ethinc group
2. Create a side-box plot to visulaize the score disctibution of four ethnic groups

In [None]:
import seaborn as sns

In [None]:
# 1. Create a bar plot to visualised mean and std values of score from the four ethinc group
# How do you sort a dataframe form values in the column 'ethnicity'?
# write code to sort data here
sns.barplot(data=df, x='ethnicity', y='score')
plt.show()

In [None]:
# 2.Create a side-box plot to visulaize the score disctibution of four ethnic groups
# write code here