# Book Recommendations Dataset

This notebook uses the [Book Recommendations](https://www.kaggle.com/arashnic/book-recommendation-dataset) dataset available on Kaggle. 

## Imports

In [1]:
# TODO: Import any other modules you need
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# TODO: Read Books.csv, Ratings.csv and Users.csv into similary named dataframes
Books_df=pd.read_csv("Books.csv")
Ratings_df= pd.read_csv("Ratings.csv")
Users_df=pd.read_csv("Users.csv")

## Exploratory Data Analysis (EDA)

### TODO: 
* Use the `describe` and `info` methods on each of your dataframes. 
* Check the `dtypes` for the columns in each dataframe.
* Check for missing values in each dataframe
* Check for any other operations commonly used during EDA and perform it here

In [3]:
Books_df.describe() #Statistical Summary of Books dataframe

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
count,271360,271360,271359,271360,271358,271360,271360,271357
unique,271360,242135,102023,202,16807,271044,271044,271041
top,1591298148,Selected Poems,Agatha Christie,2002,Harlequin,http://images.amazon.com/images/P/067153145X.0...,http://images.amazon.com/images/P/156076323X.0...,http://images.amazon.com/images/P/014023313X.0...
freq,1,27,632,13903,7535,2,2,2


In [4]:
Users_df.describe() #Statistical Summary of Users dataframe

Unnamed: 0,User-ID,Age
count,278858.0,168096.0
mean,139429.5,34.751434
std,80499.51502,14.428097
min,1.0,0.0
25%,69715.25,24.0
50%,139429.5,32.0
75%,209143.75,44.0
max,278858.0,244.0


In [5]:
Ratings_df.describe() #Statistical Summary of Ratings datafram

Unnamed: 0,User-ID,Book-Rating
count,1149780.0,1149780.0
mean,140386.4,2.86695
std,80562.28,3.854184
min,2.0,0.0
25%,70345.0,0.0
50%,141010.0,0.0
75%,211028.0,7.0
max,278854.0,10.0


In [6]:
Books_df.info() #Info on Books dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 271360 entries, 0 to 271359
Data columns (total 8 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   ISBN                 271360 non-null  object
 1   Book-Title           271360 non-null  object
 2   Book-Author          271359 non-null  object
 3   Year-Of-Publication  271360 non-null  object
 4   Publisher            271358 non-null  object
 5   Image-URL-S          271360 non-null  object
 6   Image-URL-M          271360 non-null  object
 7   Image-URL-L          271357 non-null  object
dtypes: object(8)
memory usage: 16.6+ MB


In [7]:
Users_df.info() #Info on Users dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 278858 entries, 0 to 278857
Data columns (total 3 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   User-ID   278858 non-null  int64  
 1   Location  278858 non-null  object 
 2   Age       168096 non-null  float64
dtypes: float64(1), int64(1), object(1)
memory usage: 6.4+ MB


In [8]:
Ratings_df.describe() #Info on Ratings datafram

Unnamed: 0,User-ID,Book-Rating
count,1149780.0,1149780.0
mean,140386.4,2.86695
std,80562.28,3.854184
min,2.0,0.0
25%,70345.0,0.0
50%,141010.0,0.0
75%,211028.0,7.0
max,278854.0,10.0


In [9]:
#Data Types for Books Dataframe
Books_df.dtypes

ISBN                   object
Book-Title             object
Book-Author            object
Year-Of-Publication    object
Publisher              object
Image-URL-S            object
Image-URL-M            object
Image-URL-L            object
dtype: object

In [10]:
#Data Types for Users Dataframe
Users_df.dtypes

User-ID       int64
Location     object
Age         float64
dtype: object

In [11]:
#Data Types for Rating Dataframe
Ratings_df.dtypes

User-ID         int64
ISBN           object
Book-Rating     int64
dtype: object

In [12]:
Books_df.isnull().sum()

ISBN                   0
Book-Title             0
Book-Author            1
Year-Of-Publication    0
Publisher              2
Image-URL-S            0
Image-URL-M            0
Image-URL-L            3
dtype: int64

In [13]:
Users_df.isnull().sum()

User-ID          0
Location         0
Age         110762
dtype: int64

In [14]:
Ratings_df.isnull().sum()

User-ID        0
ISBN           0
Book-Rating    0
dtype: int64

In [15]:
Ratings_df['Book-Rating'].value_counts()

0     716109
8     103736
10     78610
7      76457
9      67541
5      50974
6      36924
4       8904
3       5996
2       2759
1       1770
Name: Book-Rating, dtype: int64

## Fact-Finding

You may need to apply the `groupby` method here. Here's a great introduction: [Pandas GroupBy: Your Guide to Grouping Data in Python](https://realpython.com/pandas-groupby/).
### TODO: 
* Group the `ratings` dataframe by `User-ID` and check the `average` rating by user.
* Group the `ratings` dataframe by `ISBN` and check the `average` rating each book has gotten.
* Create a `Series` containing the number of books each user has read
* Who are the top 10 most rated authors?
* Who are the top 10 highest rated authors?

In [16]:
Ratings_df.groupby(['User-ID'])['Book-Rating'].mean()

User-ID
2         0.000000
7         0.000000
8         2.166667
9         2.000000
10        3.000000
            ...   
278846    4.000000
278849    2.250000
278851    3.956522
278852    8.000000
278854    5.250000
Name: Book-Rating, Length: 105283, dtype: float64

In [17]:
Ratings_df.groupby(['ISBN'])['Book-Rating'].mean()

ISBN
 0330299891    3.0
 0375404120    1.5
 0586045007    0.0
 9022906116    3.5
 9032803328    0.0
              ... 
cn113107       0.0
ooo7156103     7.0
§423350229     0.0
´3499128624    8.0
Ô½crosoft      7.0
Name: Book-Rating, Length: 340556, dtype: float64

In [18]:
read_by_user=Ratings_df.groupby(['User-ID'])['ISBN'].count()
print (read_by_user)
type(read_by_user)

User-ID
2          1
7          1
8         18
9          3
10         2
          ..
278846     2
278849     4
278851    23
278852     1
278854     8
Name: ISBN, Length: 105283, dtype: int64


pandas.core.series.Series

In [28]:
Author_Ratings=pd.merge(Books_df,Ratings_df, on='ISBN')
Author_Ratings.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L,User-ID,Book-Rating
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,2,0
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,8,5
2,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,11400,0
3,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,11676,8
4,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,41385,0


In [31]:
Author_Ratings.groupby(['Book-Rating'])['Book-Author'].value_counts()

Book-Rating  Book-Author            
0            Nora Roberts               5491
             Stephen King               5414
             John Grisham               3460
             James Patterson            3458
             Mary Higgins Clark         3100
                                        ... 
10           phyllis reynolds naylor       1
             retold by Dandi               1
             various authors               1
             x x                           1
             Ã?Â?dÃ?Â¶n von Horvath        1
Name: Book-Author, Length: 198919, dtype: int64

In [95]:
#Highest Rated Authors
Author_Ratings.groupby(['Book-Author'])['Book-Rating'].idxmax()

Book-Author
 D. Chiel                           364116
 J. D. Landis                       989720
 Mimma Balia                        776095
'N Sync                             755515
142 moms from all over the world    979436
                                     ...  
Â¢ystein LÂ²nn                      964246
Ã?dÃ¶n von HorvÃ¡th                 894293
Ã?Â?dÃ?Â¶n von Horvath              934698
Ã?Â?pictÃ?Â¨te                      846044
Ã?Â?ric Holder                      494560
Name: Book-Rating, Length: 101588, dtype: int64

## Engineering

### TODO: 
* Apply the `to_datetime` method to the `Year-of-Publication` `inplace`.
* Use the `drop` method to drop all the `Image` columns in the `books` dataframe.
* Read this article - [How to Split String Column in Pandas into Multiple Columns](https://www.statology.org/pandas-split-column/) then apply the method to split the `location` column into three columns `city`, `state`, `country`.
* Which other `str` methods are available in pandas? Try out one or two others.
* Merge the `ratings` and `books` dataframe into one dataframe named `books_and_ratings`
* Merge the `books_and_ratings` dataset and the `users` dataframe into a dataframe named `user_preferences`

In [65]:
Books_df.drop(['Image-URL-S','Image-URL-M','Image-URL-L'],1,inplace=True)

In [66]:
books_and_rating=pd.merge(Books_df,Ratings_df,on='ISBN')

In [67]:
user_preferences=pd.merge(books_and_rating,Users_df)

In [68]:
user_preferences.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,User-ID,Book-Rating,Location,Age
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,2,0,"stockton, california, usa",18.0
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,8,5,"timmins, ontario, canada",
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,8,0,"timmins, ontario, canada",
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,8,0,"timmins, ontario, canada",
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,8,0,"timmins, ontario, canada",


In [69]:
Users_df[['City','State','Country']] = Users_df['Location'].str.split(',',2,expand=True)

In [70]:
Users_df['City']=Users_df['City'].str.capitalize()
Users_df['Country']=Users_df['Country'].str.upper()

In [71]:
Users_df.head()

Unnamed: 0,User-ID,Location,Age,City,State,Country
0,1,"nyc, new york, usa",,Nyc,new york,USA
1,2,"stockton, california, usa",18.0,Stockton,california,USA
2,3,"moscow, yukon territory, russia",,Moscow,yukon territory,RUSSIA
3,4,"porto, v.n.gaia, portugal",17.0,Porto,v.n.gaia,PORTUGAL
4,5,"farnborough, hants, united kingdom",,Farnborough,hants,UNITED KINGDOM


In [72]:
Year=Books_df['Year-Of-Publication']
Year.replace({0:np.nan,'0':np.nan,'DK Publishing Inc':np.nan,'Gallimard':np.nan},inplace=True)
Year.dropna(inplace=True)
Year=pd.Series(Year)
Year=Year.astype('category')
Year=pd.to_datetime(Year,format='%Y',errors='coerce')
Year=Year.dt.year

In [73]:
Books_df['Year']=Year

In [74]:
Books_df.dropna(inplace=True)

In [75]:
Books_df['Year']=Books_df['Year'].astype('int')

In [76]:
Books_df.drop(['Year-Of-Publication'],1,inplace=True)

In [77]:
Books_df.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Publisher,Year
0,195153448,Classical Mythology,Mark P. O. Morford,Oxford University Press,2002
1,2005018,Clara Callan,Richard Bruce Wright,HarperFlamingo Canada,2001
2,60973129,Decision in Normandy,Carlo D'Este,HarperPerennial,1991
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,Farrar Straus Giroux,1999
4,393045218,The Mummies of Urumchi,E. J. W. Barber,W. W. Norton &amp; Company,1999


## Querying and Indexing 

Here's a useful article:
* [The Unreasonable Effectiveness of Method Chaining in Pandas](https://towardsdatascience.com/the-unreasonable-effectiveness-of-method-chaining-in-pandas-15c2109e3c69)

### TODO:
* Use the `query` method to return rows from the `user_preferences` dataframe where `User-ID` is equal to `276822`
* Repeat the same operation as before but this time return rows where `ISBN` is equal to `0345260317`
* Return rows where the `ISBN` is equal to `0345260317` and `Book-Rating` is greater than 0.
* Repeat all the operations performed above using the `loc` method rather than `query`
* Time the operation of both methods (See [Profiling and Timing Code](https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html)). Which is faster?
* Use the `to_dict` method to convert the output of query to dictionary. 
* Write a function that takes in a `dataframe` and a `user_id` and returns a dictionary of rows with matching user ids. Test your function using the `user_preferences` dataframe and any user id in the data.

In [78]:
user_preferences[user_preferences['User-ID']==276822]

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,User-ID,Book-Rating,Location,Age
826597,0375821813,Hoot (Newbery Honor Book),CARL HIAASEN,2002,Knopf Books for Young Readers,276822,9,"calgary, alberta, canada",11.0
826598,0786817070,"Artemis Fowl (Artemis Fowl, Book 1)",Eoin Colfer,2002,Miramax Kids,276822,10,"calgary, alberta, canada",11.0
826599,0060096195,The Boy Next Door,Meggin Cabot,2002,Avon Trade,276822,10,"calgary, alberta, canada",11.0
826600,038076041X,A Kid's Guide to How to Save the Planet (Camel...,Billy Goodman,1990,Harpercollins Juvenile Books,276822,10,"calgary, alberta, canada",11.0
826601,0439087597,The Last Book in the Universe,Rodman Philbrick,2002,Scholastic Signature,276822,0,"calgary, alberta, canada",11.0
826602,0141310340,Skin and Other Stories (Now in Speak!),Roald Dahl,2002,Puffin Books,276822,9,"calgary, alberta, canada",11.0
826603,0805057706,The Number Devil: A Mathematical Adventure,Hans Magnus Enzensberger,1998,Henry Holt &amp; Company,276822,10,"calgary, alberta, canada",11.0
826604,0689804458,A String in the Harp,Nancy Bond,1996,Aladdin,276822,8,"calgary, alberta, canada",11.0
826605,0786812508,The Sandy Bottom Orchestra,Jenny Nilson,1998,Hyperion Books for Children,276822,9,"calgary, alberta, canada",11.0
826606,0439401399,The Contest,Gordon Korman,2002,Scholastic,276822,6,"calgary, alberta, canada",11.0


In [79]:
user_preferences[user_preferences['ISBN']=='0345260317']

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,User-ID,Book-Rating,Location,Age
72650,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,102967,0,"olympia, washington, usa",54.0
128440,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,240144,0,"muskego, wisconsin, usa",34.0
219013,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,32773,0,"olathe, kansas, usa",28.0
266779,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,96440,0,"rochester, new york, usa",29.0
282809,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,130474,0,"nashville, tennessee, usa",33.0
300391,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,162155,0,"federal way, washington, usa",57.0
327827,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,196160,0,"auburn, alabama, usa",22.0
424832,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,255489,0,"sherman oaks, california, usa",
540857,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,67,0,"framingham, massachusetts, usa",43.0
540861,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,32195,0,"fall river, massachusetts, usa",


In [80]:
user_preferences[(user_preferences['ISBN']=='0345260317') & (user_preferences['Book-Rating'] >0)]

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,User-ID,Book-Rating,Location,Age
541614,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,64010,7,"barrie, ontario, canada",
541640,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,152435,7,"indianapolis, indiana, usa",40.0
541800,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,245988,3,"regina, saskatchewan, canada",


In [83]:
user_preferences.loc[(user_preferences['User-ID']==276822)]

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,User-ID,Book-Rating,Location,Age
826597,0375821813,Hoot (Newbery Honor Book),CARL HIAASEN,2002,Knopf Books for Young Readers,276822,9,"calgary, alberta, canada",11.0
826598,0786817070,"Artemis Fowl (Artemis Fowl, Book 1)",Eoin Colfer,2002,Miramax Kids,276822,10,"calgary, alberta, canada",11.0
826599,0060096195,The Boy Next Door,Meggin Cabot,2002,Avon Trade,276822,10,"calgary, alberta, canada",11.0
826600,038076041X,A Kid's Guide to How to Save the Planet (Camel...,Billy Goodman,1990,Harpercollins Juvenile Books,276822,10,"calgary, alberta, canada",11.0
826601,0439087597,The Last Book in the Universe,Rodman Philbrick,2002,Scholastic Signature,276822,0,"calgary, alberta, canada",11.0
826602,0141310340,Skin and Other Stories (Now in Speak!),Roald Dahl,2002,Puffin Books,276822,9,"calgary, alberta, canada",11.0
826603,0805057706,The Number Devil: A Mathematical Adventure,Hans Magnus Enzensberger,1998,Henry Holt &amp; Company,276822,10,"calgary, alberta, canada",11.0
826604,0689804458,A String in the Harp,Nancy Bond,1996,Aladdin,276822,8,"calgary, alberta, canada",11.0
826605,0786812508,The Sandy Bottom Orchestra,Jenny Nilson,1998,Hyperion Books for Children,276822,9,"calgary, alberta, canada",11.0
826606,0439401399,The Contest,Gordon Korman,2002,Scholastic,276822,6,"calgary, alberta, canada",11.0


In [84]:
user_preferences.loc[(user_preferences['ISBN']=='0345260317')]

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,User-ID,Book-Rating,Location,Age
72650,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,102967,0,"olympia, washington, usa",54.0
128440,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,240144,0,"muskego, wisconsin, usa",34.0
219013,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,32773,0,"olathe, kansas, usa",28.0
266779,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,96440,0,"rochester, new york, usa",29.0
282809,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,130474,0,"nashville, tennessee, usa",33.0
300391,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,162155,0,"federal way, washington, usa",57.0
327827,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,196160,0,"auburn, alabama, usa",22.0
424832,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,255489,0,"sherman oaks, california, usa",
540857,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,67,0,"framingham, massachusetts, usa",43.0
540861,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,32195,0,"fall river, massachusetts, usa",


In [86]:
user_preferences.loc[((user_preferences['ISBN']=='0345260317') & (user_preferences['Book-Rating']>0))]

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,User-ID,Book-Rating,Location,Age
541614,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,64010,7,"barrie, ontario, canada",
541640,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,152435,7,"indianapolis, indiana, usa",40.0
541800,345260317,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,1978,Ballantine Books,245988,3,"regina, saskatchewan, canada",


In [87]:
%timeit user_preferences[(user_preferences['ISBN']=='0345260317') & (user_preferences['Book-Rating'] >0)]

223 ms ± 47.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [88]:
%timeit user_preferences.loc[((user_preferences['ISBN']=='0345260317') & (user_preferences['Book-Rating']>0))]

206 ms ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [89]:
user_preferences[(user_preferences['ISBN']=='0345260317') & (user_preferences['Book-Rating'] >0)].to_dict()

{'ISBN': {541614: '0345260317', 541640: '0345260317', 541800: '0345260317'},
 'Book-Title': {541614: 'The Dragons of Eden: Speculations on the Evolution of Human Intelligence',
  541640: 'The Dragons of Eden: Speculations on the Evolution of Human Intelligence',
  541800: 'The Dragons of Eden: Speculations on the Evolution of Human Intelligence'},
 'Book-Author': {541614: 'Carl Sagan',
  541640: 'Carl Sagan',
  541800: 'Carl Sagan'},
 'Year-Of-Publication': {541614: 1978, 541640: 1978, 541800: 1978},
 'Publisher': {541614: 'Ballantine Books',
  541640: 'Ballantine Books',
  541800: 'Ballantine Books'},
 'User-ID': {541614: 64010, 541640: 152435, 541800: 245988},
 'Book-Rating': {541614: 7, 541640: 7, 541800: 3},
 'Location': {541614: 'barrie, ontario, canada',
  541640: 'indianapolis, indiana, usa',
  541800: 'regina, saskatchewan, canada'},
 'Age': {541614: nan, 541640: 40.0, 541800: nan}}

In [93]:
def query(df,user_id):
    result=df.loc[(df['User-ID']==user_id)]
    return result

In [94]:
query(user_preferences,276822)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,User-ID,Book-Rating,Location,Age
826597,0375821813,Hoot (Newbery Honor Book),CARL HIAASEN,2002,Knopf Books for Young Readers,276822,9,"calgary, alberta, canada",11.0
826598,0786817070,"Artemis Fowl (Artemis Fowl, Book 1)",Eoin Colfer,2002,Miramax Kids,276822,10,"calgary, alberta, canada",11.0
826599,0060096195,The Boy Next Door,Meggin Cabot,2002,Avon Trade,276822,10,"calgary, alberta, canada",11.0
826600,038076041X,A Kid's Guide to How to Save the Planet (Camel...,Billy Goodman,1990,Harpercollins Juvenile Books,276822,10,"calgary, alberta, canada",11.0
826601,0439087597,The Last Book in the Universe,Rodman Philbrick,2002,Scholastic Signature,276822,0,"calgary, alberta, canada",11.0
826602,0141310340,Skin and Other Stories (Now in Speak!),Roald Dahl,2002,Puffin Books,276822,9,"calgary, alberta, canada",11.0
826603,0805057706,The Number Devil: A Mathematical Adventure,Hans Magnus Enzensberger,1998,Henry Holt &amp; Company,276822,10,"calgary, alberta, canada",11.0
826604,0689804458,A String in the Harp,Nancy Bond,1996,Aladdin,276822,8,"calgary, alberta, canada",11.0
826605,0786812508,The Sandy Bottom Orchestra,Jenny Nilson,1998,Hyperion Books for Children,276822,9,"calgary, alberta, canada",11.0
826606,0439401399,The Contest,Gordon Korman,2002,Scholastic,276822,6,"calgary, alberta, canada",11.0


### Hierarchical Indexing

Study the [Hierarchical Indexing](https://jakevdp.github.io/PythonDataScienceHandbook/03.05-hierarchical-indexing.html) chapter from Python Data Science Handbook.

### TODO
* Include 3 or 4 of your observations in the `Comments` below after exploring the dataset

### Comments
* The User with User-ID 276822 has read a total of 14 books. The total number of books read by each user can be gotten from the query function

* The Loc method is faster than the Query method for Indexing

* The Year-Of-Publication Column for the Books.csv dataset contains some invalid entry e.g. DK Publishing, 0, '0' etc.

* The Users.csv dataset has a high amount of missing data in the Age column (110762)