# Dataset Info

This notebook includes information and a preview of all the datasets you can choose from for the midterm project.

In [1]:
import pandas as pd

In [35]:
pd.options.display.max_rows = 200
pd.options.display.max_columns = 50

## Spotify

This Spotify data includes the top 200 most played songs for the week of 3/4/2021 - 3/11/2021 in a number of different countries — including the US, Vietnam, Japan, Turkey, Mexico, Morocco, Egypt, and Sweden. These countries were chosen at random by Prof. Walsh.

The data was drawn from [Spotify Charts](https://spotifycharts.com/regional/us/weekly/latest). For more information, see https://artists.spotify.com/faq/stats#charts

In [9]:
spotify_df = pd.read_csv('Spotify-Top-Charts_March-2021.csv', encoding='utf-8')

In [10]:
spotify_df.head(1)

Unnamed: 0,Position,Track Name,Artist,Streams,URL,country
0,1,Chúng Ta Của Hiện Tại,Sơn Tùng M-TP,344644,https://open.spotify.com/track/17iGUekw5nFt5mI...,Vietnam


## Seattle Public Library

The Seattle Public Library makes its [circulation data available to the public](https://data.seattle.gov/Community/Checkouts-by-Title/tmmm-ytt6/data#revert) (in anonymized form). This dataset includes items that were checked out more than 20 times a month in 2020 and 2021.

For more information about the columns in this dataset, see https://data.seattle.gov/Community/Checkouts-by-Title/tmmm-ytt6

In [15]:
spl_df = pd.read_csv('Seattle-Library_2020-2021.csv', encoding='utf-8')

In [16]:
spl_df.head(1)

Unnamed: 0,UsageClass,CheckoutType,MaterialType,CheckoutYear,CheckoutMonth,Checkouts,Title,Creator,Subjects,Publisher,PublicationYear,CheckoutMonthYear
0,Digital,OverDrive,EBOOK,2020,4,23,Eileen: A Novel,Ottessa Moshfegh,"Fiction, Literature, Thriller","Penguin Group (USA), Inc.",2015,2020-04


## Book Translations

This dataset includes information about all original publications of fiction and poetry published in the U.S. in English translation since 2008. The data is drawn from Publishers Weekly's [Translation Database](https://www.publishersweekly.com/pw/translation/home/index.html).

For more information, see https://www.publishersweekly.com/pw/corp/translation-database-FAQ.html

In [25]:
translation_df = pd.read_csv('PW-Book-Translations.csv', encoding='utf-8')

In [26]:
translation_df.head(1)

Unnamed: 0,isbn,e-isbn,title,auth-first,auth-last,trnsl-first,trnsl-last,publisher,genre,price,pubdate mo,pubdate yr,language,country,addl auth,addl trnsl,auth gender,trnsl gend,count
0,9781477829783,,Shades of White,Ki-Ela,,Kate,Northrop,AmazonCrossing,Fiction,14.95,May,2015.0,German,Germany,,,Female,Female,1


## Biopics

This dataset includes information about 676 "biographical" films from 1900 to 2014. It includes information about the gender, race, and occupation of each biogrpahical subject in each film. This data was curated by Hannah Fingerhut for her 538 essay.

For more information, see the [538 essay](https://fivethirtyeight.com/features/straight-outta-compton-is-the-rare-biopic-not-about-white-dudes/) and the corresponding [GitHub repository](https://github.com/fivethirtyeight/data/tree/master/biopics).

In [27]:
biopics_df = pd.read_csv('Biopics.csv', encoding='utf-8')

In [28]:
biopics_df.head(1)

Unnamed: 0,title,site,country,year_release,box_office,director,number_of_subjects,subject,type_of_subject,race_known,subject_race,person_of_color,subject_sex,lead_actor_actress
0,10 Rillington Place,http://www.imdb.com/title/tt0066730/,UK,1971,-,Richard Fleischer,1,John Christie,Criminal,Unknown,,0,Male,Richard Attenborough


# NBA

This data includes information and basketball statistics about NBA players going back to 1977.

This data was drawn from 538 and Neil Paine's essay ["Luka Dončić And The Mavs Are Pushing The Limits Of Offensive Efficiency."](https://fivethirtyeight.com/features/luka-doncic-and-the-mavs-are-pushing-the-limits-of-offensive-efficiency/) For more information, see the corresponding [GitHub repository](https://github.com/fivethirtyeight/nba-player-advanced-metrics) and the explanation of [RAPTOR metrics](https://fivethirtyeight.com/features/how-our-raptor-metric-works/).

In [31]:
nba_df = pd.read_csv('NBA-Historical.csv', encoding='utf=8')

In [36]:
nba_df.head(1)

Unnamed: 0,player_id,name_common,year_id,type,age,team_id,pos,tmRtg,franch_id,G,Min,MP%,MPG,P/36,TS%,A/36,R/36,SB/36,TO/36,Raptor O,Raptor D,Raptor+/-,Raptor WAR,PIE%,AWS%,USG%,AST%,TOV%,ORB%,DRB%,TRB%,STL%,BLK%,ORtg,%Pos,DRtg,2P%,3P%,FT%,3PAr,FTAr,Pace +/-
0,youngtr01,Trae Young,2020,RS,21,ATL,PG,,ATL,60,2120,,35.3,,59.5,,,,,7.1,-3.3,3.7,7.19,,,34.9,45.6,16.2,1.6,11.5,6.5,1.4,0.3,,,,,,,45.5,44.8,2.9
