# Lab 2 — pandas practice -Trending Songs (5 points)

In this lab, you will work through a **single pandas mini‑case** with five questions. Please:
- Write each answer.
- Put **each task in its own code cell**.
- Run all code cells so that every block shows its output before exporting.
- Export as HTML (.html) and submit that file. 
- To get all points for each question, you need to include the output of your code AND your explanation if required.

---

## Mini‑Case: pandas — Trending Songs

**Context:** You’re helping a music streaming platform analyze a small set of **trending songs**. The dataset includes the song title, artist, release year, average critic rating, and total number of streams (in millions). The curation team wants to preview the data, check its structure, pull out subsets of interest, and add a simple “classic” flag. All operations should come from **Chapter 5** (pandas).

In [1]:
import pandas as pd

data = {
    'title':  ['Blinding Lights', 'Bad Guy', 'Dance Monkey', 'Peaches', 'As It Was', 'Anti-Hero', 'Levitating', 'Flowers'],
    'artist': ['The Weeknd', 'Billie Eilish', 'Tones and I', 'Justin Bieber', 'Harry Styles', 'Taylor Swift', 'Dua Lipa', 'Miley Cyrus'],
    'year':   [2019, 2019, 2019, 2021, 2022, 2022, 2020, 2023],
    'rating': [9.1, 8.6, 8.0, 7.8, 8.5, 8.7, 8.4, 8.3],
    'streams': [3600, 3200, 2800, 2100, 2600, 2500, 3000, 2700]  # in millions
}

df = pd.DataFrame(data)
df

Unnamed: 0,title,artist,year,rating,streams
0,Blinding Lights,The Weeknd,2019,9.1,3600
1,Bad Guy,Billie Eilish,2019,8.6,3200
2,Dance Monkey,Tones and I,2019,8.0,2800
3,Peaches,Justin Bieber,2021,7.8,2100
4,As It Was,Harry Styles,2022,8.5,2600
5,Anti-Hero,Taylor Swift,2022,8.7,2500
6,Levitating,Dua Lipa,2020,8.4,3000
7,Flowers,Miley Cyrus,2023,8.3,2700


### Q1 — Preview like a playlist
**Narrative:** Before prioritizing songs for editorial placement, the team wants a quick **top and tail** preview to eyeball entries.

**Task:** Show the **first 3** rows and the **last 2** rows using head() and tail(). Hint: Display results using display().

In [2]:
display(df.head(3))
display(df.tail(2))

Unnamed: 0,title,artist,year,rating,streams
0,Blinding Lights,The Weeknd,2019,9.1,3600
1,Bad Guy,Billie Eilish,2019,8.6,3200
2,Dance Monkey,Tones and I,2019,8.0,2800


Unnamed: 0,title,artist,year,rating,streams
6,Levitating,Dua Lipa,2020,8.4,3000
7,Flowers,Miley Cyrus,2023,8.3,2700


### Q2 — Structural and numeric sanity checks
**Narrative:** A quick structural and statistical scan helps catch data‑type surprises and spot outliers in ratings or streams.

**Task:** Display `df.describe()`.

In [3]:
df.describe()

Unnamed: 0,year,rating,streams
count,8.0,8.0,8.0
mean,2020.625,8.425,2812.5
std,1.59799,0.406202,458.062691
min,2019.0,7.8,2100.0
25%,2019.0,8.225,2575.0
50%,2020.5,8.45,2750.0
75%,2022.0,8.625,3050.0
max,2023.0,9.1,3600.0


Explain the meaning of each row:

`count`: This counts how many objects we have of that type. In this case we have an equal count of 8 because we have 8 elements of our dataframe and they each have some value within the columns `year`, `rating`, and `streams`

`mean`: The mean of all the values of the respective column. The mean is found by summing all of the values then dividing by the amount of values we have.

`std` : The standard deviation of the values of the respective column, satistically represents the amount of variation of the values of a variable based on the mean. 

`min 25%, 50%, 75%, max`: The a list of the minimumn and maximum values of the data in a respective column along with the lower, middle, and upper percentiles. 

### Q3 — Column picks for editorial review
**Narrative:** Editors often need just the titles, and sometimes titles alongside ratings for quick debates.

**Task:** Select only the `title` column (Series). Then select both `title` and `rating` (DataFrame).

In [4]:
display(df.loc[:, 'title'])
display(df.loc[:, ('title', 'rating')])

0    Blinding Lights
1            Bad Guy
2       Dance Monkey
3            Peaches
4          As It Was
5          Anti-Hero
6         Levitating
7            Flowers
Name: title, dtype: object

Unnamed: 0,title,rating
0,Blinding Lights,9.1
1,Bad Guy,8.6
2,Dance Monkey,8.0
3,Peaches,7.8
4,As It Was,8.5
5,Anti-Hero,8.7
6,Levitating,8.4
7,Flowers,8.3


### Q4 — Drill down by position vs. by label
**Narrative:** One editor asks: “What do the first two entries look like?” (position-based). Another asks: “Show me exactly the entry with index label `3`, but only the `title` and `year`.” (label-based)

**Task:** Use `.iloc` to select the **first two rows by position**; use `.loc` to fetch the **row with label 3** and those columns.

In [9]:
display(df.iloc[0: 2])
display(df.loc[3, ['title', 'rating']])

Unnamed: 0,title,artist,year,rating,streams
0,Blinding Lights,The Weeknd,2019,9.1,3600
1,Bad Guy,Billie Eilish,2019,8.6,3200


title     Peaches
rating        7.8
Name: 3, dtype: object

### Q5 — Flag classics and sort by quality
**Narrative:** The team wants to compare recent songs (after 2020) with older ones, but they also care about popularity (streams) as well as quality (rating). To keep it simple, you’ll build a combined score:
score = rating * (streams / 100)

This way, higher-rated movies with more votes will float to the top. Then you’ll produce two ranked lists.

**Task:**

Add a new column is_recent = (year > 2020).

Add a new column score = rating * (streams / 100).

Sort the DataFrame by score (descending).

Print:

The top 3 movies overall by this score.

The top 2 recent songs (where is_recent == True) by this score.

In [6]:
compare = df.assign(is_recent = df["year"] > 2020)
compare = compare.assign(score = df["rating"] * (df["streams"]/100))
display(compare.sort_values(by=["score"], ascending=False).head(3))
display(compare.sort_values(by=["is_recent", "score"], ascending=False).head(2))

Unnamed: 0,title,artist,year,rating,streams,is_recent,score
0,Blinding Lights,The Weeknd,2019,9.1,3600,False,327.6
1,Bad Guy,Billie Eilish,2019,8.6,3200,False,275.2
6,Levitating,Dua Lipa,2020,8.4,3000,False,252.0


Unnamed: 0,title,artist,year,rating,streams,is_recent,score
7,Flowers,Miley Cyrus,2023,8.3,2700,True,224.1
4,As It Was,Harry Styles,2022,8.5,2600,True,221.0
