<a href="https://colab.research.google.com/github/UnitForDataScience/Fall-2024-Python-Open-Labs/blob/main/Convey_your_story_Data_viz_with_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Numbers don’t Speak for Themselves: Convey Your Story Using Data Visualization with Python

An Open Lab by ASU Library's UDS

# Starting with the Classics: MatplotLib


---

Matplotlib is one of the oldest data visualization packages out there. Advanced customization requires some time (and code), but it is a great place to start. We will begin by practicing the recipe we described.

In [None]:
# Import the matplotlib package and assign it an alias
import matplotlib.pyplot as plt

In [None]:
# Initialize two lists that will serve as our dataset
Days = [1, 2, 3, 4, 5, 6, 7]
Sales = [119, 102, 95, 93, 138, 110, 98]

In [None]:
# Creating a line plot using "plot". We are also specifying our axes and title.
plt.scatter(Days, Sales)
plt.title('Lighsabers Sold Over Time')
plt.xlabel('Day')
plt.ylabel('Lighsabers Sold')

# Print our graph
plt.show()

Now that you ran your first plot, go ahead and try changing the plot type!
You can use these for reference: scatter plot (plt.scatter), a stem plot (plt.stem), or a bar chart (plt.bar).

In your own time, navigate to https://matplotlib.org/stable/plot_types/index.html to learn about all the types of plots that are available through matplotlib



---



# Case study: Exploring how GDP per Capita influences Life Expectancy

---

Let's start putting our skills to the test. In this example, we will create a scatter plot that shows the relationship between a country's GDP per capita and the mean life expectancy of their people, and we will improve it step by step.

Our data comes from the "Gapminder" Dataset, which is available online. We will go step by step, following the general steps we described before. We will start by loading our first packages and use "pandas" to read the dataset from the link.

*This example is inspired by Hans Rosling's work with data visualization*

In [None]:
# Import the matplotlib and pandas packages and assign them an alias
import matplotlib.pyplot as plt
import pandas as pd

# Load the Gapminder dataset from Plotly
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv")
print(df.head(12))                       # This function will allow us to see the first twelve rows of the dataset, and print it to our output



In [None]:
#Filter the dataset by year, we will focus on 2007 for now
df_year = df[df['year'] == 2007]         # Create a variable to filter the dataset for the year 2007
print(df_year.head(12))                  # This function will allow us to see the first twelve rows of the filtered dataset


### Now that we loaded and filtered our dataset, let's go ahead and plot Life Expectancy vs GDP per Capita for 2007

In [None]:
# Basic scatter plot using Matplotlib
plt.scatter(df_year['gdpPercap'],
            df_year['lifeExp']
            )

# Add labels and title
plt.title('Life Expectancy vs GDP per Capita (2007)')
plt.xlabel('GDP per Capita')
plt.ylabel('Life Expectancy (Years)')

# Show the plot
plt.show()

### What is good about this plot? What can be improved?

### Let's go ahead and add some color to the continents to improve our takeaways

In [None]:
# Define a color map for continents
continent_colors = {'Africa': 'red', 'Americas': 'green', 'Asia': 'blue', 'Europe': 'orange', 'Oceania': 'purple'}
colors = df_year['continent'].map(continent_colors)

# Create a scatter plot with colored continents
plt.figure(figsize=(10, 6)) #specify the size of our plot
plt.scatter(df_year['gdpPercap'],
            df_year['lifeExp'],
            c = colors)


# Add labels and title
plt.title('Life Expectancy vs GDP per Capita (2007) by Continent')
plt.xlabel('GDP per Capita')
plt.ylabel('Life Expectancy (Years)')

# Show the plot
plt.show()

### Let's add some more context by taking into account the population size of each country

In [None]:
# Create a scatter plot with bubble sizes based on population
plt.figure(figsize=(10, 6))
plt.scatter(df_year['gdpPercap'],
            df_year['lifeExp'],
            c=colors,
            s=df_year['pop']/1000000,
            alpha=0.6)

# Add labels and title
plt.title('Life Expectancy vs GDP per Capita (2007) with Population Bubble Sizes')
plt.xlabel('GDP per Capita')
plt.ylabel('Life Expectancy (Years)')

# Show the plot
plt.show()

## Introducing Seaborn

---
Seaborn is built on matplotlib, but makes your life much easier by doing a lot of the coding for you. Overall, it requires less coding to customize and provide great-looking visualizations. The code below is a recreation of our last plot using the seaborn package.

In your own time, check out https://seaborn.pydata.org/index.html


In [None]:
# We begin by importing Seaborn and giving it an alias.
# Whenever you're using Seaborn, you will also import matplotlib
import seaborn as sns

# Create a scatter plot with Seaborn, coloring by continent and scaling bubbles by population
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df_year,
                x='gdpPercap',
                y='lifeExp',
                hue='continent',
                size='pop',
                sizes=(20, 200),
                alpha=0.7)

# Add title and labels
plt.title('Life Expectancy vs GDP per Capita (2007)')
plt.xlabel('GDP per Capita')
plt.ylabel('Life Expectancy')

# Show the plot
plt.show()

# Enter Plotly: Making our visualization interactive


---
Plotly will allow us to create interactive plots and animations with our data. Let's go ahead and use it to keep exploring our data.

In [None]:
#Import plotly.express as px
import plotly.express as px


# The sample dataset "Gapminder" is already pre-loaded in Plotly
df = px.data.gapminder()

# Create an interactive scatter plot
fig = px.scatter(df.query("year == 2007"),  # This is how we filter the data by year in plotly using a boolean
                 x="gdpPercap",
                 y="lifeExp",
                 color="continent",
                 size="pop",
                 size_max=70,)

# Customize and show the plot
fig.update_layout(title='Interactive Plot: GDP vs Life Expectancy (2007)')
fig.show()

# Let's run this once, and check our x and y axes and our data points Can we see most of the data clearly? how much information can we get out of this?
# Try adding these parameters to the figure: hover_name="country", size_max=70,, log_x=True

### To better present our case, let's add an animation that goes through the years. Plotly will allow us to do this without having to make several plots.

In [None]:
import plotly.express as px

# Sample data: Gapminder dataset (available in Plotly)
df = px.data.gapminder()


# Create an interactive scatter plot
fig = px.scatter(df,                             # Notice that we are not filtering the dataset anymore
                 x="gdpPercap",
                 y="lifeExp",
                 animation_frame="year",         # This will allow us to animate by year
                 animation_group="country",      # This plots each country separately
                 size="pop",
                 color="continent",
                 hover_name="country",
                 log_x=True,
                 size_max=70,
                 range_x=[200,60000],             # We can also specify the size of our axes
                 range_y=[25,90])

# Customize and show the plot
fig.update_layout(title='Interactive Plot: GDP vs Life Expectancy')
fig.show()

You did it! It was though, but you used three different libraries to explore your data and created a visualization that can help your audience understand your results.

Here are some questions that can help you think through what we achieved: What are some of the insights that we get from looking at one year of data? What are some insights that we get from looking at the change through five decades? zhow did adding more context to the data help you understand what was happening?

If you want to learn more about Plotly, check out https://plotly.com/python/