<h1>Libraries</h1>

<h3>What is a library?</h3>

<p>A library is a collection of pre-written, reusable code that developers can use to perform specific tasks.</p>

<h3>Why use libraries?</h3>

<p>This means that we don't have to rewrite code for problems that someone has already solved. For example, I want to find the square root of a number. I can write my own code for it, or I can use the sqrt function in the math library, which will do it for me.</p>

<h3>What libraries are out there?</h3>

<p>Hundreds of thousands! Python has over 137,000 libraries that anyone can use. Some examples are datetime, math, and random. Different coding languages all have their own libraries, too. And that's just public libraries - many companies, including Bloomberg, have their own libraries that can be used with their internal code.</p>

<h3>What is needed to use a library?</h3>

<h6>pip install</h6> 

<p>Python comes with modules preinstalled - datetime, math, random, and many more. But for libraries that are not already installed by python, we can use pip install. This will install the libraries for us so that we can use them. We can install libraries individually (pip install pandas) or we can make a file with all of the required libraries and install them all at once (pip install -r requirements.txt).</p>

<h6>import</h6>

<p>Once the libraries are installed, we need to import them into the file where we want them to be used. We need to add imports in EVERY file that will use the libraries. If we import a library into file1.py, file2.py will not be able to use the library unless we import there are well.</p>

<h3>How do we know what functions a library has?</h3>

<p>Documentation! All libraries should have documentation explaining what each function does and how to use it.

Standard Python library: https://docs.python.org/3/library/index.html</p>

<h3>What is pandas?</h3>

<p>Pandas is a popular Python library used for data analysis and manipulation. It lets us analyze big data and draw conclusions.

Pandas documentation: https://pandas.pydata.org/docs/user_guide/index.html</p>


In [2]:
import pandas as pd

df = pd.read_csv("music_data.csv")
df

Unnamed: 0,Favorite Genre,Pop Rating,Hip Hop Rating,Rock Rating,Country Rating,R&B/Soul Rating,Classical Rating,K-Pop Rating,EDM Rating,Jazz Rating,...,Music App,Favorite Artist,Happy,Energetic,Relaxed,Focused,Sad,Confident,Inspired,Nostalgic
0,Rock,4,4,5,4,4,4,3,2,2,...,Spotify,Owl City,False,True,False,False,True,False,False,True
1,Hip Hop/Rap,4,5,2,2,5,3,2,2,4,...,Spotify,Dave,False,False,True,False,False,False,False,False
2,Pop,4,3,4,1,3,2,1,2,2,...,Spotify,Maluma,True,True,False,True,False,False,False,False
3,Indie,5,5,5,1,4,4,1,1,5,...,Spotify,Djo,False,False,True,False,True,False,False,True
4,Rock,3,2,5,2,3,4,3,2,4,...,YouTube,The Clash,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
220,Pop,5,3,2,3,2,2,2,2,2,...,Apple Music,Taylor Swift,True,False,False,False,False,False,False,False
221,Pop,5,2,5,4,3,2,1,1,1,...,Apple Music,Taylor Swift,True,True,True,False,False,False,False,False
222,Pop,5,4,2,1,3,2,3,5,2,...,Spotify,Zayn,True,True,False,False,False,True,False,False
223,R&B/Soul,5,4,4,3,5,4,3,3,3,...,Apple Music,Hozier,False,False,False,True,False,False,False,False


In [3]:
df.describe()

Unnamed: 0,Pop Rating,Hip Hop Rating,Rock Rating,Country Rating,R&B/Soul Rating,Classical Rating,K-Pop Rating,EDM Rating,Jazz Rating,Indie Rating,Number of Hours
count,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0
mean,4.533333,3.933333,3.666667,2.4,3.466667,2.933333,2.0,2.533333,2.8,3.2,3.533333
std,0.719623,1.0,1.30247,1.257123,0.886405,1.0,0.968246,1.149534,1.224745,1.472971,1.896896
min,3.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,4.0,3.0,2.0,1.0,3.0,2.0,1.0,2.0,2.0,1.0,2.0
50%,5.0,4.0,4.0,2.0,3.0,3.0,2.0,2.0,3.0,4.0,3.0
75%,5.0,5.0,5.0,3.0,4.0,4.0,3.0,3.0,4.0,4.0,5.0
max,5.0,5.0,5.0,5.0,5.0,4.0,4.0,5.0,5.0,5.0,8.0


<h3>What is plotly?</h3>

<p>Plotly is a Python library used for data visualizations and graphing. 
    
Plotly documentation: https://plotly.com/python/</p>

<h3>Let's graph!</h3> 

<h4>Pie charts</h4>

<p>For our first exercise, let's make a pie chart to see which apps we use to listen to music.
    
https://plotly.com/python/pie-charts/#donut-chart</p>

In [9]:
import plotly.express as px

app_dict = df["Music App"].value_counts().to_dict()

labels = list(app_dict.keys())
values = list(app_dict.values())

fig = px.pie(df, values=values, names=labels, title="Music App Preference")
fig.show()

<h4>Bar charts</h4>

<p>Let's make a bar chart to show when we listen to music and how many hours.
    
https://plotly.com/python/bar-charts/</p>

In [7]:
# Group by time of day and sum hours
time_hours = df.groupby("Time of Day")["Number of Hours"].sum().reset_index()

# Sort for cleaner chart
time_hours = time_hours.sort_values(by="Number of Hours", ascending=True)

fig = px.bar(time_hours, 
             x="Time of Day", 
             y="Number of Hours", 
             color="Time of Day",
             title="Total Music Listening Hours by Time of Day",
             color_discrete_sequence=px.colors.qualitative.Set3)

fig.show()

<h3>Stacked bar graphs</h3>

<p>Let's make things a little more complicated. Here, we are going to use multiple columns from our music data to create a stacked bar chart to see how we've rated each genre of music.</p>

In [20]:
genres=[
    "Pop Rating",
    "Hip Hop Rating",
    "Rock Rating",
    "Country Rating",
    "R&B/Soul Rating",
    "Classical Rating",
    "K-Pop Rating",
    "EDM Rating",
    "Jazz Rating",
    "Indie Rating"
]

genre_map = {}
for genre in genres:
    genre_map[genre] = df[genre].value_counts().sort_index().to_dict()

genre_map

{'Pop Rating': {3: 30, 4: 45, 5: 150},
 'Hip Hop Rating': {2: 30, 3: 30, 4: 90, 5: 75},
 'Rock Rating': {1: 15, 2: 45, 3: 15, 4: 75, 5: 75},
 'Country Rating': {1: 75, 2: 45, 3: 60, 4: 30, 5: 15},
 'R&B/Soul Rating': {2: 30, 3: 90, 4: 75, 5: 30},
 'Classical Rating': {1: 15, 2: 75, 3: 45, 4: 90},
 'K-Pop Rating': {1: 90, 2: 60, 3: 60, 4: 15},
 'EDM Rating': {1: 30, 2: 105, 3: 60, 5: 30},
 'Jazz Rating': {1: 30, 2: 75, 3: 60, 4: 30, 5: 30},
 'Indie Rating': {1: 60, 3: 45, 4: 75, 5: 45}}

In [21]:
import plotly.graph_objects as go

genres=[
    "Pop Rating",
    "Hip Hop Rating",
    "Rock Rating",
    "Country Rating",
    "R&B/Soul Rating",
    "Classical Rating",
    "K-Pop Rating",
    "EDM Rating",
    "Jazz Rating",
    "Indie Rating"
]

genre_map = {}
for genre in genres:
    genre_map[genre] = df[genre].value_counts().to_dict()
    
m = {}
for x in range(1, 6):
    m[x] = [genre_map[genre][x] if x in genre_map[genre] else 0 for genre in genre_map.keys()]

fig = go.Figure(data=[
    go.Bar(name='1', x=genres, y=m[1]),
    go.Bar(name='2', x=genres, y=m[2]),
    go.Bar(name='3', x=genres, y=m[3]),
    go.Bar(name='4', x=genres, y=m[4]),
    go.Bar(name='5', x=genres, y=m[5])
])
# Change the bar mode
fig.update_layout(barmode='stack', title="Genre Ratings")
fig.show()