#### Practice Exercise: Data Visualization with Matplotlib

##### 1. Scatter Plot:

 a. Basic Scatter Plot:  
 Plot total_bill (x-axis) vs. tip (y-axis).

In [None]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [None]:
# Load the tips dataset

df = pd.read_csv('tips.csv')

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.nunique()

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(df['total_bill'], df['tip'])
plt.title('Total Bill vs. Tip')
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.show()

 b. Enhance the Plot:
Add the following parameters:  
Transparency: 0.6,
Color: 'blue',
Colormap: 'viridis',
Size: 50

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(df['total_bill'], df['tip'], alpha=0.6, c='blue', s=50, cmap='viridis')
plt.title('Total Bill vs. Tip with Enhancements')
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.show()

c. Reflect Additional Data:
Modify the parameters to represent more data:  
Use size to reflect the size column (number of people).  
Use color (c) to reflect the time column (Lunch/Dinner), assigning numerical values (e.g., Lunch=0, Dinner=1)  
Apply the 'plasma' colormap (cmap).  

In [None]:
#time_numeric = df['time'].apply(lambda x: 0 if x == 'Lunch' else 1) #this is an option but it's better to use select

In [None]:
# Define the conditions
conditions = [
    df['time'] == 'Lunch',
    df['time'] == 'Dinner'
]

# Define the corresponding values
values = [0, 1]

# Apply np.select to create the new column
df['time_numeric'] = np.select(conditions, values, default=np.nan)


In [None]:
plt.figure(figsize=(8, 6))
scatter = plt.scatter(df['total_bill'], df['tip'], alpha=0.6, c=df['time_numeric'], s=df['size']*20, cmap='plasma')
plt.title('Total Bill vs. Tip with Size and Time')
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.colorbar(scatter, ticks=[0, 1], label='Time (0=Lunch, 1=Dinner)')
plt.show()

d.
Experiment with different colormaps available in Matplotlib to see how they affect the visualization.  
Try using other columns, such as day or sex, to define colors or sizes and observe the patterns that emerge.

##### 2. Second Scatter Plot:

a. Create a Scatter Plot:
Plot total_bill (x-axis) vs. size (y-axis).

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(df['total_bill'], df['size'])
plt.title('Total Bill vs. Size')
plt.xlabel('Total Bill')
plt.ylabel('Size')
plt.show()

b. Enhance the Plot:
Add the following parameters:  
Transparency: 0.7  
Size: 80  
Color: 'green'  
Titles: Add appropriate titles for the plot and axes.

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(df['total_bill'], df['size'], alpha=0.7, c='green', s=80)
plt.title('Total Bill vs. Size with Enhancements')
plt.xlabel('Total Bill')
plt.ylabel('Size')
plt.show()

c. Reflect Additional Data:  
Adjust parameters to represent more data:  
Use size to reflect the tip column.  
Use color to reflect the day column, assigning numerical values (e.g., Thur=0, Fri=1, Sat=2, Sun=3).  
Apply the 'inferno' colormap (cmap).  

In [None]:
#one way:
day_numeric = df['day'].map({'Thur': 0, 'Fri': 1, 'Sat': 2, 'Sun': 3}) 
#we didn't talk about map. it uses a dictionary to map values.
#good for categorial data
#it is vectorized
#you can read more about it here: https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html

In [None]:
#another way:
conditions = [
    df['day'] == 'Thur',
    df['day'] == 'Fri',
    df['day'] == 'Sat',
    df['day'] == 'Sun'
]
choices = [0, 1, 2, 3]

df['day_numeric'] = np.select(conditions, choices, default=np.nan)

In [None]:
plt.figure(figsize=(8, 6))
scatter = plt.scatter(df['total_bill'], df['size'], alpha=0.7, c=day_numeric, s=df['tip']*20, cmap='inferno')
plt.title('Total Bill vs. Size with Tip and Day')
plt.xlabel('Total Bill')
plt.ylabel('Size')
cbar = plt.colorbar(scatter, ticks=[0, 1, 2, 3])
cbar.ax.set_yticklabels(['Thur', 'Fri', 'Sat', 'Sun'])
plt.show()

d.  
Modify the marker styles (e.g., circles, squares) to differentiate data points.  
Incorporate annotations for specific data points to highlight outliers or interesting cases.

##### 3. Histogram:
a. Basic Histogram:  
Create a histogram of the total_bill column.

In [None]:
plt.figure(figsize=(8, 6))
plt.hist(df['total_bill'])
plt.title('Histogram of Total Bill')
plt.xlabel('Total Bill')
plt.ylabel('Frequency')
plt.show()

b. Customize the Histogram:  
Change the number of bins to 20.  
Set the bar color to 'skyblue'.  
Set the edge color of the bars to 'black'.  

In [None]:
plt.figure(figsize=(8, 6))
plt.hist(df['total_bill'], bins=20, color='skyblue', edgecolor='black')
plt.title('Histogram of Total Bill with Customizations')
plt.xlabel('Total Bill')
plt.ylabel('Frequency')
plt.show()

c. Repeat with Another Column:
Create a histogram for the tip column.  
Set the bar color to 'lightgreen'.  
Set the edge color to 'darkgreen'.  

In [None]:
plt.figure(figsize=(8, 6))
plt.hist(df['tip'], bins=20, color='lightgreen', edgecolor='darkgreen')
plt.title('Histogram of Tip with Customizations')
plt.xlabel('Tip')
plt.ylabel('Frequency')
plt.show()

d. Display Both Histograms Together one on tio of the other.
adjust labels and adjust the transparency (alpha) of the histograms to visualize overlapping distributions.

In [None]:
plt.hist(df['total_bill'], bins=20, color='skyblue', edgecolor='black', label = 'tip')
plt.hist(df['tip'], bins=20, color='lightgreen',alpha =0.5, edgecolor='darkgreen', label= 'total bill')

plt.title('Histogram of Total Bill with Customizations')
plt.title('Histogram of Tip with Customizations')
plt.ylabel('Frequency')
plt.legend()

plt.show()

In [None]:
e. 
Explore the effect of different bin sizes on the representation of data distribution.

##### 4. Bar Plot:

a. Basic Bar Plot:  
Create a bar plot showing the average tip amount for each day of the week.

In [None]:
avg_tip_per_day = df.groupby('day')['tip'].mean()
plt.figure(figsize=(8, 6))
plt.bar(avg_tip_per_day.index, avg_tip_per_day.values)

plt.show()

##### b. Enhance the Bar Plot:
Change the bar color to 'coral'.  
Set the edge color to 'black'.  
Add titles and labels to the axes.  

In [None]:
plt.bar(avg_tip_per_day.index, avg_tip_per_day.values,color='coral', edgecolor='black')
plt.title('Average Tip by Day')
plt.xlabel('Day')
plt.ylabel('Average Tip')
plt.show()
 


##### 5. Pie Charts
a. Pie Chart for 'smoker'  
Create a pie chart showing the proportion of smokers vs. non-smokers.  
Set the colors to ['lightcoral', 'lightskyblue'].
use the autopct='%1.1f%%' in the plt.pie to format the precentage.

In [None]:
smoker_counts = df['smoker'].value_counts()

In [None]:
plt.figure(figsize=(6, 6))
plt.pie(smoker_counts, labels=smoker_counts.index, 
        colors=['lightcoral', 'lightskyblue'], 
        autopct='%1.1f%%', #we didnt show this in class it is a way to formt the numbres
        startangle=90)
plt.title('Proportion of Smokers vs. Non-Smokers')
plt.show()

b. Pie Chart for 'time'  
Create a pie chart showing the proportion of Lunch vs. Dinner.  
Set the colors to ['lightgreen', 'lightblue'].  
use startangle=90 inside plt.pie, see what it does to your chart  

In [None]:
time_counts = df['time'].value_counts()

In [None]:
plt.figure(figsize=(6, 6))
plt.pie(time_counts,
        labels=time_counts.index, 
        colors=['lightgreen', 'lightblue'],
        autopct='%1.1f%%', 
        #startangle=90
       )
plt.title('Proportion of Lunch vs. Dinner')
plt.show()

c. Pie Chart for 'day'
Create a pie chart showing the distribution of days.  
Include the following customizations:
- Explode the 'Sunday' slice by 0.1.
- Set the colors to ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue'].
- Add a shadow to the pie chart.

In [None]:
day_counts = df['day'].value_counts()
explode = [0, 0, 0, 0.1]  # Explode Sunday slice
colors = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue']
plt.figure(figsize=(6, 6))
plt.pie(day_counts,
        labels=day_counts.index,
        colors=colors, explode=explode,
        shadow=True)
plt.title('Distribution of Days')
plt.show()

d.
- Experiment with different start angles to rotate the pie chart and observe the visual effect.
- Experiment with different Explode values.