### Pandas functions

In [1]:
import pandas as pd

In [2]:
df_books = pd.read_csv('/work/bestsellers.csv', sep= ',', header=0)
df_books.head(5)

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction


The info() method in Pandas is used to get a concise summary of a DataFrame, including the data types of each column and the number of non-null values.

In [3]:
df_books.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 550 entries, 0 to 549
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         550 non-null    object 
 1   Author       550 non-null    object 
 2   User Rating  550 non-null    float64
 3   Reviews      550 non-null    int64  
 4   Price        550 non-null    int64  
 5   Year         550 non-null    int64  
 6   Genre        550 non-null    object 
dtypes: float64(1), int64(3), object(3)
memory usage: 30.2+ KB


The describe() function in Pandas is used to generate descriptive statistics of a DataFrame, such as count, mean, standard deviation, minimum, maximum, and quartile values.

In [4]:
df_books.describe()

Unnamed: 0,User Rating,Reviews,Price,Year
count,550.0,550.0,550.0,550.0
mean,4.618364,11953.281818,13.1,2014.0
std,0.22698,11731.132017,10.842262,3.165156
min,3.3,37.0,0.0,2009.0
25%,4.5,4058.0,7.0,2011.0
50%,4.7,8580.0,11.0,2014.0
75%,4.8,17253.25,16.0,2017.0
max,4.9,87841.0,105.0,2019.0


The tail() function in Pandas is used to display the last n rows of a DataFrame, where n is the number of rows specified in the function (by default, n=5).

In [5]:
df_books.tail(2)

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction
549,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2019,Non Fiction


memory_usage(deep=True) is used to calculate the memory usage of a Pandas DataFrame, including the memory usage of the data and the memory usage of the objects referenced by the DataFrame. 

The deep parameter is set to True to enable a deep introspection of the DataFrame's memory usage. This includes the memory usage of objects that are referenced by the DataFrame, such as the underlying NumPy arrays or Python objects contained within the DataFrame.

The deep parameter is set to False in this case, which means that only the memory usage of the DataFrame itself, without considering the memory usage of the objects it references, will be calculate

In [7]:
df_books.memory_usage(deep=True)

Index           128
Name           4400
Author         4400
User Rating    4400
Reviews        4400
Price          4400
Year           4400
Genre          4400
dtype: int64

value_counts() is a method in Pandas that can be used to count the unique values in a Pandas Series. It returns a Pandas Series containing the count of each unique value in the input Series, in descending order.

The output Series is sorted in descending order by default, but you can change the order by passing ascending=True as a parameter. Additionally, you can include normalize=True to get the frequency of each unique value in decimal format rather than a count, as a proportion of the total count of values in the Series.

In [8]:
df_books['Author'].value_counts()

Jeff Kinney                           12
Suzanne Collins                       11
Gary Chapman                          11
Rick Riordan                          11
American Psychological Association    10
                                      ..
Mitch Albom                            1
W. Cleon Skousen                       1
Bruce Springsteen                      1
Brené Brown                           1
George Orwell                          1
Name: Author, Length: 248, dtype: int64

drop_duplicates() is a method in Pandas that can be used to remove duplicate rows from a DataFrame. It returns a new DataFrame with only the unique rows.

By default, drop_duplicates() considers all columns in the DataFrame to identify duplicate rows. You can specify one or more columns to consider as a subset of the DataFrame by passing them as a list to the subset parameter.

df.drop_duplicates(subset=['A', 'B'])

By default, drop_duplicates() keeps the first occurrence of each unique row and drops the subsequent occurrences. You can keep the last occurrence of each unique row instead by passing keep='last'. You can also remove all occurrences of duplicates with keep=False.

In [9]:
df_books.drop_duplicates()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


In [10]:
df_books.sort_values('Year')

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
177,"I, Alex Cross",James Patterson,4.6,1320,7,2009,Fiction
131,Glenn Beck's Common Sense: The Case Against an...,Glenn Beck,4.6,1365,11,2009,Non Fiction
417,The Last Lecture,Randy Pausch,4.7,4028,9,2009,Non Fiction
241,New Moon (The Twilight Saga),Stephenie Meyer,4.6,5680,10,2009,Fiction
72,Diary of a Wimpy Kid: The Last Straw (Book 3),Jeff Kinney,4.8,3837,15,2009,Fiction
...,...,...,...,...,...,...,...
150,Guts,Raina Telgemeier,4.8,5476,7,2019,Non Fiction
466,The Subtle Art of Not Giving a F*ck: A Counter...,Mark Manson,4.6,26490,15,2019,Non Fiction
462,The Silent Patient,Alex Michaelides,4.5,27536,14,2019,Fiction
130,"Girl, Wash Your Face: Stop Believing the Lies ...",Rachel Hollis,4.6,22288,12,2019,Non Fiction


sort_values() is a method that can be used to sort a Pandas DataFrame by the values in a particular column

In [12]:
df_books.sort_values('Year')

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
177,"I, Alex Cross",James Patterson,4.6,1320,7,2009,Fiction
131,Glenn Beck's Common Sense: The Case Against an...,Glenn Beck,4.6,1365,11,2009,Non Fiction
417,The Last Lecture,Randy Pausch,4.7,4028,9,2009,Non Fiction
241,New Moon (The Twilight Saga),Stephenie Meyer,4.6,5680,10,2009,Fiction
72,Diary of a Wimpy Kid: The Last Straw (Book 3),Jeff Kinney,4.8,3837,15,2009,Fiction
...,...,...,...,...,...,...,...
150,Guts,Raina Telgemeier,4.8,5476,7,2019,Non Fiction
466,The Subtle Art of Not Giving a F*ck: A Counter...,Mark Manson,4.6,26490,15,2019,Non Fiction
462,The Silent Patient,Alex Michaelides,4.5,27536,14,2019,Fiction
130,"Girl, Wash Your Face: Stop Believing the Lies ...",Rachel Hollis,4.6,22288,12,2019,Non Fiction


In [11]:
df_books.sort_values('Year', ascending=False)

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
549,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2019,Non Fiction
294,School Zone - Big Preschool Workbook - Ages 4 ...,School Zone,4.8,23047,6,2019,Non Fiction
489,The Wonderful Things You Will Be,Emily Winfield Martin,4.9,8842,10,2019,Fiction
263,P is for Potty! (Sesame Street) (Lift-the-Flap),Naomi Kleinberg,4.7,10820,5,2019,Non Fiction
130,"Girl, Wash Your Face: Stop Believing the Lies ...",Rachel Hollis,4.6,22288,12,2019,Non Fiction
...,...,...,...,...,...,...,...
418,The Last Olympian (Percy Jackson and the Olymp...,Rick Riordan,4.8,4628,7,2009,Fiction
38,"Breaking Dawn (The Twilight Saga, Book 4)",Stephenie Meyer,4.6,9769,13,2009,Fiction
92,"Eat This, Not That! Thousands of Simple Food S...",David Zinczenko,4.3,956,14,2009,Non Fiction
139,Good to Great: Why Some Companies Make the Lea...,Jim Collins,4.5,3457,14,2009,Non Fiction


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=8af9fde7-6d07-4638-8fef-c27d1b3023f8' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>