<a href="https://colab.research.google.com/github/aliciama16/is262a/blob/main/IS262A_Library.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mount (aka Connect) Your Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Import the Pandas library to view your data as a dataframe.

Use df.head() to explore and take a 'peek' at your data

In [None]:
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/Library_Usage.csv')
df.head()

# Value Count

Count the values of types within certain rows

In [None]:
df["Patron Type Definition"].value_counts()

In [None]:
df["Year Patron Registered"].value_counts()

# Visualizations with MatPlot Lib

Import MatPlotLib to create visualizations.

* https://matplotlib.org/
* https://matplotlib.org/cheatsheets/
* https://matplotlib.org/stable/users/explain/colors/index.html

In [None]:
import matplotlib.pyplot as plt

df["Year Patron Registered"].value_counts().plot(
    kind="pie",
)

plt.title("Breakdown of Patron Type Definitions")
plt.ylabel("Number of Patrons")
plt.xlabel("Patron Type Definition")
plt.show()

\See how you want to change your graph... perhaps we want to add a little bit of code to display percentages?

In [None]:
df["Year Patron Registered"].value_counts().plot(
    kind="pie",
    autopct="%1.1f%%",
    )

plt.title("Breakdown of Patron Type Definitions")
plt.ylabel("Number of Patrons")
plt.xlabel("Patron Type Definition")
plt.show()


In [None]:
import matplotlib.pyplot as plt

df.plot(
    kind='scatter',
    x='Total Checkouts',
    y='Total Renewals',
    color='purple',
    alpha=0.5
)

plt.title("Total Checkouts vs. Total Renewals")
plt.xlabel("Total Checkouts")
plt.ylabel("Total Renewals")
plt.show()


In [None]:
df["Circulation Active Month"].value_counts().plot(
    kind="bar",
    color=["skyblue"]
)

plt.title("Library Circulation Activity by Month")
plt.ylabel("Number of Circulations")
plt.xlabel("Month")
plt.show()

Explore the sorts of customizations you can make! Reorganizing, the colors, the outlines, the size of the chart, the way the labels display, the line styles, etc. There are so many possibilities!

In [None]:
# Sort so months appear chronologically
month_order = ["January","February","March","April","May","June","July","August","September","October","November","December"]
month_counts = df["Circulation Active Month"].value_counts().reindex(month_order)

# Plot
month_counts.plot(
    kind="bar",
    color="skyblue",
    edgecolor="black",
    figsize=(10,6)
)

plt.title("Library Circulation Activity by Month", fontsize=16, weight="bold")
plt.xlabel("Month", fontsize=12)
plt.ylabel("Number of Circulations", fontsize=12)
plt.xticks(rotation=45, ha="right")
plt.grid(axis="y", linestyle="--", alpha=0.6)
plt.tight_layout()
plt.show()


# Check for Missing Data!

In [None]:
df.info()
df.isna().sum()

In this case, it seems to make sense to replace all the empty cells in this column with something like 'Not Captured' (so we can capture that missing data in our visualization)

In [None]:
df['Circulation Active Month'] = df['Circulation Active Month'].fillna('Not Captured')

Always double check to make sure it worked! (It did - yay!)

In [None]:
df.isna().sum()

In [None]:
# Sort values so months appear chronologically
month_order = ["January","February","March","April","May","June","July","August","September","October","November","December", "Not Captured"]
month_counts = df["Circulation Active Month"].value_counts().reindex(month_order)

# Plot
month_counts.plot(
    kind="bar",
    color="hotpink",
    edgecolor="black",
    figsize=(10,6)
)

plt.title("Library Circulation Activity by Month", fontsize=16, weight="bold")
plt.xlabel("Month", fontsize=12)
plt.ylabel("Number of Circulations", fontsize=12)
plt.xticks(rotation=45, ha="right")
plt.grid(axis="y", linestyle="--", alpha=0.6)
plt.tight_layout()
plt.show()

Did you notice the dates were showing up as YYYY.0 in the main DF? It seems like on import it was stored as a decimal # instead of an integer. We can change that with a quick little code:

In [None]:
df["Circulation Active Year"] = df["Circulation Active Year"].astype("Int64")

df.head()


# Lowercase a Column

In [None]:
df["Patron Type Definition"] = df["Patron Type Definition"].str.lower()

df.head()

# Create a New Cleaned Column Instead

In [None]:
df["Circ_Month_Cleaned"] = df["Circulation Active Month"].str.lower()

df.head()


# Save your new data frame!

NOTE: This will only download your new CSV file. It will not download your graphs. You can right click on those to save!



In [None]:
df.to_csv("/content/drive/MyDrive/WHAT YOU WANT TO NAME IT.csv", index=False)


#Upload your code to GitHub!

Hmmm, but what if we want to save this code? What if this is how we want to show other researchers how we created our graphs and how we cleaned the data to do so?

Perhaps we should save to **GITHUB??!**

Go to:
1. File
2. Save a copy in GitHub
3. Choose the repository you want to upload to!
4. Make sure you save to your main branch
5. Commit and then go check GitHub!