# DataFrame Basics Exercise

## Part 1
* Use pandas to read the `bestsellers` dataset into a DataFrame 
* Once you've done that, use Pandas to figure out how many rows and columns the DF has
* Inspect the first 5 rows
* Inspect the first 19 rows
* Inspect the last 5 rows
* Inspect the last 2 rows 
* Which columns (if any) are missing values?
* What datatype did Pandas assign to "User Rating"?
* How many integer columns are in the DataFrame?

In [1]:
import pandas as pd

In [3]:
bestsellers = pd.read_csv("data/bestsellers.csv")
bestsellers

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


In [4]:
bestsellers.min()

Name           10-Day Green Smoothie Cleanse
Author                      Abraham Verghese
User Rating                              3.3
Reviews                                   37
Price                                      0
Year                                    2009
Genre                                Fiction
dtype: object

In [5]:
bestsellers.max()

Name           You Are a Badass: How to Stop Doubting Your Gr...
Author                                              Zhi Gang Sha
User Rating                                                  4.9
Reviews                                                    87841
Price                                                        105
Year                                                        2019
Genre                                                Non Fiction
dtype: object

In [9]:
bestsellers.head().mean(numeric_only=True)

User Rating        4.7
Reviews        13494.0
Price             12.6
Year            2016.2
dtype: float64

In [14]:
bestsellers.mode()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,Publication Manual of the American Psychologic...,Jeff Kinney,4.8,8580.0,8.0,2009,Non Fiction
1,,,,,,2010,
2,,,,,,2011,
3,,,,,,2012,
4,,,,,,2013,
5,,,,,,2014,
6,,,,,,2015,
7,,,,,,2016,
8,,,,,,2017,
9,,,,,,2018,


In [18]:
bestsellers['User Rating'].sum()

2540.1000000000004

In [25]:
bestsellers['Author'].count()

550

In [24]:
bestsellers['Author'].nunique()

248

In [28]:
bestsellers['Author'].mode().count()

1

In [44]:
bestsellers.describe(include=["object"])

Unnamed: 0,Name,Author,Genre
count,550,550,550
unique,351,248,2
top,Publication Manual of the American Psychologic...,Jeff Kinney,Non Fiction
freq,10,12,310


## Part 2

* The `mount_everest_deaths` dataset has its own index column provided in the dataset.  When importing it, use the existing index column.
* Which columns have zero null values?
* Which column has the most null values?


In [32]:
mt_everest = pd.read_csv("data/mount_everest_deaths.csv", index_col=0)
mt_everest

Unnamed: 0_level_0,Name,Date,Age,Expedition,Nationality,Cause of death,Location
No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Dorje,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
2,Lhakpa,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
3,Norbu,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
4,Pasang,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
5,Pema,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
...,...,...,...,...,...,...,...
306,Christopher Jon Kulish,"May 27, 2019",62.0,Climbing the Seven Summits,United States,Cardiac event during descent,South Col
307,Puwei Liu,"May 12, 2021",55.0,Seven Summit Treks,United States,Exhaustion,Near South Summit
308,Abdul Waraich,"May 12, 2021",41.0,Seven Summit Treks,Switzerland,Exhaustion,Near South Summit
309,Pemba Tashi Sherpa,"May 18, 2021",28.0,Climbing the Seven Summits,Nepal,Fall into a crevasse,Between Camp I & Camp II


In [34]:
mt_everest.shape

(310, 7)

In [36]:
mt_everest.info()

<class 'pandas.core.frame.DataFrame'>
Index: 310 entries, 1 to 310
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Name            310 non-null    object 
 1   Date            310 non-null    object 
 2   Age             160 non-null    float64
 3   Expedition      271 non-null    object 
 4   Nationality     309 non-null    object 
 5   Cause of death  296 non-null    object 
 6   Location        291 non-null    object 
dtypes: float64(1), object(6)
memory usage: 19.4+ KB


## Part 3
* Import the `movie_titles.tsv` dataset
* You'll notice that it is not comma-separated! You'll need to tell `read_csv` what the separator actually is.
* The dataset does not come with its own column headings, so you'll need to provide those as well.  The columns are, in order, `id`, `title`, `year`, `imdb_rating`, `imdb_id`, and `genres`
* Once you have successfully read the dataset into a DataFrame, inspect the last 7 rows!

In [43]:
columns = ["id", "title", "year", "imdb_rating", "imdb_id", "generes"]
movies = pd.read_csv("data/movie_titles.tsv", sep='\t', names=columns)
movies


Unnamed: 0,id,title,year,imdb_rating,imdb_id,generes
0,m0,10 things i hate about you,1999,6.9,62847.0,['comedy' 'romance']
1,m1,1492: conquest of paradise,1992,6.2,10421.0,['adventure' 'biography' 'drama' 'history']
2,m2,15 minutes,2001,6.1,25854.0,['action' 'crime' 'drama' 'thriller']
3,m3,2001: a space odyssey,1968,8.4,163227.0,['adventure' 'mystery' 'sci-fi']
4,m4,48 hrs.,1982,6.9,22289.0,['action' 'comedy' 'crime' 'drama' 'thriller']
...,...,...,...,...,...,...
612,m612,watchmen,2009,7.8,135229.0,['action' 'crime' 'fantasy' 'mystery' 'sci-fi'...
613,m613,xxx,2002,5.6,53505.0,['action' 'adventure' 'crime']
614,m614,x-men,2000,7.4,122149.0,['action' 'sci-fi']
615,m615,young frankenstein,1974,8.0,57618.0,['comedy' 'sci-fi']


In [None]:
columns = []