**Problem:**

You are given the following dataset:
1. **Audible Data** : https://1drv.ms/u/s!AiqdXCxPTydhoog8ckLN-6Cw55fzIg?e=EWgZ5d

Your task is to:
- Find the problems with the datasets.
- Define the Data Quality Dimensions.
- Try to clean the datasets.

In [52]:
import pandas as pd

In [53]:
audible = pd.read_csv('audible_uncleaned.csv')
print(audible.shape)
audible.head()

(87489, 8)


Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price
0,Geronimo Stilton #11 & #12,Writtenby:GeronimoStilton,Narratedby:BillLobely,2 hrs and 20 mins,4/8/2008,English,5 out of 5 stars34 ratings,468
1,The Burning Maze,Writtenby:RickRiordan,Narratedby:RobbieDaymond,13 hrs and 8 mins,1/5/2018,English,4.5 out of 5 stars41 ratings,820
2,The Deep End,Writtenby:JeffKinney,Narratedby:DanRussell,2 hrs and 3 mins,6/11/2020,English,4.5 out of 5 stars38 ratings,410
3,Daughter of the Deep,Writtenby:RickRiordan,Narratedby:SoneelaNankani,11 hrs and 16 mins,5/10/2021,English,4.5 out of 5 stars12 ratings,615
4,"The Lightning Thief: Percy Jackson, Book 1",Writtenby:RickRiordan,Narratedby:JesseBernstein,10 hrs,13-01-10,English,4.5 out of 5 stars181 ratings,820


### 1. Write a summary for your data

With the trend toward audiobooks growing, I gathered this data to understand how the audiobook market has been growing over the years. From authors of audiobooks to release dates, the data represents the important details of audiobooks from 1998 till 2025 (pre-planned releases).

I have yet to find a great audiobooks dataset and hence the urge to make a dataset that provides us with information on the basics and the history of audiobooks. I look to improve the dataset with more details in the near future.

### 2. Write Column descriptions

#### **Table** -> `audible`:
- name - names of the audiobooks
- author - authors of that audiobook
- narrator - narrator of that audio book
- time - runtime
- releasedate - date of release
- language - what language it is released in
- stars - ratings out of 5 and number of people voted
- price - price in rupees



### 3. Add any additional information



### Types of Assessment

There are 2 types of assessment styles:
- Manual - Looking through the data manually in google sheets
- Programmatic - By using pandas function such as info(), describe() or sample() 

#### Steps in Assessment
There are 2 steps involved in Assessment
- Discover 
- Document

### Issues with the dataset
1. Dirty Data (Quality Issues)
- time is not in minutes format `consistency`
- release date format is sometimes written with // or -- `consistency`
- stars column is not consistent and is mixed with ratings `consistency`
- 

2. Messy Data (Structural Issues)
- author has unnecessary Writtenby: written in it
- narrator has unnecessary Narratedby: written in it
- price has some values in int and some in float  
- There is no space between author names and narrator names
- price column name does not have what the currency 
-  change data types of release date to datetime,
- language to categorical
- and time stars ratings price to int or float 

#### Automatic Assessment
- head and tail
- sample 
- info
- isnull
- duplicated
- describe

In [54]:
audible.head()

Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price
0,Geronimo Stilton #11 & #12,Writtenby:GeronimoStilton,Narratedby:BillLobely,2 hrs and 20 mins,4/8/2008,English,5 out of 5 stars34 ratings,468
1,The Burning Maze,Writtenby:RickRiordan,Narratedby:RobbieDaymond,13 hrs and 8 mins,1/5/2018,English,4.5 out of 5 stars41 ratings,820
2,The Deep End,Writtenby:JeffKinney,Narratedby:DanRussell,2 hrs and 3 mins,6/11/2020,English,4.5 out of 5 stars38 ratings,410
3,Daughter of the Deep,Writtenby:RickRiordan,Narratedby:SoneelaNankani,11 hrs and 16 mins,5/10/2021,English,4.5 out of 5 stars12 ratings,615
4,"The Lightning Thief: Percy Jackson, Book 1",Writtenby:RickRiordan,Narratedby:JesseBernstein,10 hrs,13-01-10,English,4.5 out of 5 stars181 ratings,820


In [55]:
audible.tail()

Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price
87484,Last Days of the Bus Club,Writtenby:ChrisStewart,Narratedby:ChrisStewart,7 hrs and 34 mins,9/3/2017,English,Not rated yet,596
87485,The Alps,Writtenby:StephenO'Shea,Narratedby:RobertFass,10 hrs and 7 mins,21-02-17,English,Not rated yet,820
87486,The Innocents Abroad,Writtenby:MarkTwain,Narratedby:FloGibson,19 hrs and 4 mins,30-12-16,English,Not rated yet,938
87487,A Sentimental Journey,Writtenby:LaurenceSterne,Narratedby:AntonLesser,4 hrs and 8 mins,23-02-11,English,Not rated yet,680
87488,Havana,Writtenby:MarkKurlansky,Narratedby:FleetCooper,6 hrs and 1 min,7/3/2017,English,Not rated yet,569


In [56]:
audible.sample(10)

Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price
38574,资治通鉴 9 - 資治通鑑 9 [Zizhi Tongjian 9],Writtenby:司马光-司馬光-SimaGuang,Narratedby:白云出岫-白雲出岫-Baiyunchuxiu,33 hrs and 10 mins,12/2/2018,mandarin_chinese,Not rated yet,1338.0
28253,Mitarbeitermotivation lernen & umsetzen - Das ...,Writtenby:ThorstenMössinger,Narratedby:CelinaBender,1 hr and 16 mins,3/12/2021,german,Not rated yet,233.0
23750,Howard's Gift,Writtenby:EricSinoway,Narratedby:WilliamDufris,9 hrs and 19 mins,2/10/2012,English,Not rated yet,134.0
32569,Awakening Artemis,Writtenby:VanessaChakour,Narratedby:VanessaChakour,14 hrs and 27 mins,10/2/2022,English,Not rated yet,888.0
10608,How Not to Network a Nation,Writtenby:BenjaminPeters,Narratedby:DanaHickox,10 hrs and 16 mins,25-08-16,English,Not rated yet,836.0
83978,La signora della morte,Writtenby:JohnnyRosso,Narratedby:MorenoD'Isep,4 hrs and 37 mins,17-06-20,italian,Not rated yet,267.0
68976,De l'ego au Moi Unique,Writtenby:MarcGafni,Narratedby:PhilippeJoannis,4 hrs and 29 mins,3/2/2017,french,Not rated yet,641.0
15285,Everyone's own Vienna,Writtenby:MitsuyoKakuta,Narratedby:NahokoFort,17 mins,2/10/2014,japanese,Not rated yet,32.0
21874,Living with No Excuses,Writtenby:NoahGalloway,Narratedby:NoahGalloway,7 hrs and 18 mins,24-08-16,English,Not rated yet,500.0
46151,The Unspoken,Writtenby:IanK.Smith,Narratedby:AmirAbdullah,8 hrs and 16 mins,1/10/2020,English,Not rated yet,1008.0


In [57]:
audible.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87489 entries, 0 to 87488
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   name         87489 non-null  object
 1   author       87489 non-null  object
 2   narrator     87489 non-null  object
 3   time         87489 non-null  object
 4   releasedate  87489 non-null  object
 5   language     87489 non-null  object
 6   stars        87489 non-null  object
 7   price        87489 non-null  object
dtypes: object(8)
memory usage: 5.3+ MB


In [58]:
audible.isnull().sum()

name           0
author         0
narrator       0
time           0
releasedate    0
language       0
stars          0
price          0
dtype: int64

In [59]:
audible.duplicated().sum()

0

In [60]:
audible.describe()

Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price
count,87489,87489,87489,87489,87489,87489,87489,87489
unique,82767,48374,29717,2284,5058,36,665,1011
top,The Art of War,"Writtenby:矢島雅弘,石橋遊",Narratedby:anonymous,2 mins,16-05-18,English,Not rated yet,586
freq,20,874,1034,372,773,61884,72417,5533


In [61]:
audible['name'].duplicated().sum()

4722

In [62]:
audible[audible.name.duplicated()]

Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price
18,Merlin Mission Collection,Writtenby:MaryPopeOsborne,Narratedby:MaryPopeOsborne,10 hrs and 18 mins,2/5/2017,English,5 out of 5 stars11 ratings,1256.00
46,"The Lightning Thief: Percy Jackson, Book 1",Writtenby:RickRiordan,Narratedby:WalterLewis,3 hrs and 51 mins,13-01-10,English,4 out of 5 stars4 ratings,615
54,Merlin Mission Collection,Writtenby:MaryPopeOsborne,Narratedby:MaryPopeOsborne,13 hrs and 24 mins,2/5/2017,English,5 out of 5 stars8 ratings,1256.00
155,Barbie - Superprincesa,Writtenby:Mattel,Narratedby:VanessaPérezJurado,29 mins,7/3/2022,spanish,Not rated yet,38
273,Barbie - Dreamtopia,"Writtenby:Mattel,MartaCisaMuñoz-traductor",Narratedby:MiriamMonlleo,21 mins,31-01-22,catalan,Not rated yet,38
...,...,...,...,...,...,...,...,...
87443,Travels with a Donkey in the Cevennes,Writtenby:RobertLouisStevenson,Narratedby:DenisLawson,2 hrs and 51 mins,13-05-08,English,Not rated yet,569
87456,Gettysburg,Writtenby:JeffShaara,Narratedby:RobertsonDean,1 hr and 12 mins,26-03-07,English,Not rated yet,200
87457,Solo,Writtenby:PenHadow,Narratedby:PenHadow,3 hrs and 5 mins,14-02-05,English,Not rated yet,615
87475,Wanderlust,Writtenby:ElisabethEaves,Narratedby:ErinBennett,9 hrs and 51 mins,26-02-14,English,Not rated yet,668


In [63]:
audible[['name', 'time']].duplicated().sum()

238

In [64]:
audible[audible[['name', 'time']].duplicated()]

Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price
273,Barbie - Dreamtopia,"Writtenby:Mattel,MartaCisaMuñoz-traductor",Narratedby:MiriamMonlleo,21 mins,31-01-22,catalan,Not rated yet,38
510,Los Pitufos - Historias de 3 minutos,"Writtenby:Peyo,RaquelLuqueBenitez-traductor",Narratedby:GuillermoMontoya,34 mins,17-01-22,spanish,Not rated yet,38
1882,Egg Marks the Spot,Writtenby:AmyTimberlake,Narratedby:MichaelBoatman,3 hrs and 54 mins,14-09-21,English,Not rated yet,398
2195,Camels,Writtenby:RoseDavin,Narratedby:Anonymous,2 mins,21-02-22,English,Not rated yet,82
2457,Jaime,Writtenby:LeandroCabrera,Narratedby:MarianoPrince,11 mins,4/3/2022,spanish,Not rated yet,65
...,...,...,...,...,...,...,...,...
85103,Schneeriese,Writtenby:SusanKreller,Narratedby:ConstanzeWeinig,5 hrs and 17 mins,2/9/2020,german,Not rated yet,535
85175,Stronger,Writtenby:EricaMarselas,"Narratedby:AmyMelissaBentley,GrahamHalstead",9 hrs and 2 mins,12/11/2019,English,Not rated yet,586
86352,Neither Here Nor There,Writtenby:BillBryson,Narratedby:BillBryson,5 hrs and 38 mins,1/4/2010,English,4.5 out of 5 stars2 ratings,615
87239,Tee and Tour,"Writtenby:Tee,Tour",Narratedby:SuzanneToren,2 hrs and 6 mins,8/7/2004,English,Not rated yet,585


#### Making a copy

In [65]:
audible_df = audible.copy()

In [66]:
audible_df['author'] = audible_df['author'].str.split(':').str.get(1)
audible_df['narrator'] = audible_df['narrator'].str.split(':').str.get(1)
audible_df.head()

Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price
0,Geronimo Stilton #11 & #12,GeronimoStilton,BillLobely,2 hrs and 20 mins,4/8/2008,English,5 out of 5 stars34 ratings,468
1,The Burning Maze,RickRiordan,RobbieDaymond,13 hrs and 8 mins,1/5/2018,English,4.5 out of 5 stars41 ratings,820
2,The Deep End,JeffKinney,DanRussell,2 hrs and 3 mins,6/11/2020,English,4.5 out of 5 stars38 ratings,410
3,Daughter of the Deep,RickRiordan,SoneelaNankani,11 hrs and 16 mins,5/10/2021,English,4.5 out of 5 stars12 ratings,615
4,"The Lightning Thief: Percy Jackson, Book 1",RickRiordan,JesseBernstein,10 hrs,13-01-10,English,4.5 out of 5 stars181 ratings,820


In [67]:
import re


# Define a function to split based on capitalization pattern
def split_name(name):
    # Use regular expression to split when a lowercase letter is followed by an uppercase letter
    parts = re.findall(r'[A-Z][a-z]*', name)

    # Safely return first and last name or None if missing
    if len(parts) == 2:
        return parts[0], parts[1]  # Both first and last names
    elif len(parts) == 1:
        return parts[0], None  # Only first name available
    else:
        return None, None  # Handle any unexpected cases

In [68]:
split_name('DavidWarner')

('David', 'Warner')

In [69]:
audible_df[['author_first_name', 'author_last_name']] = audible_df['author'].apply(lambda x: pd.Series(split_name(x)))

In [70]:
audible_df.head()

Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price,author_first_name,author_last_name
0,Geronimo Stilton #11 & #12,GeronimoStilton,BillLobely,2 hrs and 20 mins,4/8/2008,English,5 out of 5 stars34 ratings,468,Geronimo,Stilton
1,The Burning Maze,RickRiordan,RobbieDaymond,13 hrs and 8 mins,1/5/2018,English,4.5 out of 5 stars41 ratings,820,Rick,Riordan
2,The Deep End,JeffKinney,DanRussell,2 hrs and 3 mins,6/11/2020,English,4.5 out of 5 stars38 ratings,410,Jeff,Kinney
3,Daughter of the Deep,RickRiordan,SoneelaNankani,11 hrs and 16 mins,5/10/2021,English,4.5 out of 5 stars12 ratings,615,Rick,Riordan
4,"The Lightning Thief: Percy Jackson, Book 1",RickRiordan,JesseBernstein,10 hrs,13-01-10,English,4.5 out of 5 stars181 ratings,820,Rick,Riordan


In [71]:
audible_df[['narrator_first_name', 'narrator_last_name']] = audible_df['narrator'].apply(
    lambda x: pd.Series(split_name(x)))
audible_df.head()

Unnamed: 0,name,author,narrator,time,releasedate,language,stars,price,author_first_name,author_last_name,narrator_first_name,narrator_last_name
0,Geronimo Stilton #11 & #12,GeronimoStilton,BillLobely,2 hrs and 20 mins,4/8/2008,English,5 out of 5 stars34 ratings,468,Geronimo,Stilton,Bill,Lobely
1,The Burning Maze,RickRiordan,RobbieDaymond,13 hrs and 8 mins,1/5/2018,English,4.5 out of 5 stars41 ratings,820,Rick,Riordan,Robbie,Daymond
2,The Deep End,JeffKinney,DanRussell,2 hrs and 3 mins,6/11/2020,English,4.5 out of 5 stars38 ratings,410,Jeff,Kinney,Dan,Russell
3,Daughter of the Deep,RickRiordan,SoneelaNankani,11 hrs and 16 mins,5/10/2021,English,4.5 out of 5 stars12 ratings,615,Rick,Riordan,Soneela,Nankani
4,"The Lightning Thief: Percy Jackson, Book 1",RickRiordan,JesseBernstein,10 hrs,13-01-10,English,4.5 out of 5 stars181 ratings,820,Rick,Riordan,Jesse,Bernstein


In [72]:
audible_df.drop(columns=['author', 'narrator'], inplace=True)

In [73]:
audible_df.head()

Unnamed: 0,name,time,releasedate,language,stars,price,author_first_name,author_last_name,narrator_first_name,narrator_last_name
0,Geronimo Stilton #11 & #12,2 hrs and 20 mins,4/8/2008,English,5 out of 5 stars34 ratings,468,Geronimo,Stilton,Bill,Lobely
1,The Burning Maze,13 hrs and 8 mins,1/5/2018,English,4.5 out of 5 stars41 ratings,820,Rick,Riordan,Robbie,Daymond
2,The Deep End,2 hrs and 3 mins,6/11/2020,English,4.5 out of 5 stars38 ratings,410,Jeff,Kinney,Dan,Russell
3,Daughter of the Deep,11 hrs and 16 mins,5/10/2021,English,4.5 out of 5 stars12 ratings,615,Rick,Riordan,Soneela,Nankani
4,"The Lightning Thief: Percy Jackson, Book 1",10 hrs,13-01-10,English,4.5 out of 5 stars181 ratings,820,Rick,Riordan,Jesse,Bernstein


In [76]:
audible_df['price'] = audible_df['price'].str.replace(',', '')

In [78]:
audible_df['price'] = audible_df['price'].str.replace('Free', '0')

In [82]:
audible_df['price'] = audible_df['price'].astype('float')

In [83]:
audible_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87489 entries, 0 to 87488
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   name                 87489 non-null  object 
 1   time                 87489 non-null  object 
 2   releasedate          87489 non-null  object 
 3   language             87489 non-null  object 
 4   stars                87489 non-null  object 
 5   price                87489 non-null  float64
 6   author_first_name    53853 non-null  object 
 7   author_last_name     52398 non-null  object 
 8   narrator_first_name  62198 non-null  object 
 9   narrator_last_name   61071 non-null  object 
dtypes: float64(1), object(9)
memory usage: 6.7+ MB


In [85]:
audible_df.rename(columns={'price': 'price (in Rupees)'}, inplace=True)

In [102]:
def parse_date(date):
    if '/' in date:
        return pd.to_datetime(date, format='%d/%m/%Y', errors='coerce')
    elif '-' in date:
        return pd.to_datetime(date, format='%d-%m-%y', errors='coerce')
    else:
        return pd.NaT  # Return NaT for any other unrecognized formats

In [100]:
audible_df['releasedate'].head()

0     4/8/2008
1     1/5/2018
2    6/11/2020
3    5/10/2021
4     13-01-10
Name: releasedate, dtype: object

In [106]:
audible_df['releasedate'] = audible_df['releasedate'].apply(parse_date)

In [107]:
audible_df.head()

Unnamed: 0,name,time,releasedate,language,stars,price (in Rupees),author_first_name,author_last_name,narrator_first_name,narrator_last_name
0,Geronimo Stilton #11 & #12,2 hrs and 20 mins,2008-08-04,English,5 out of 5 stars34 ratings,468.0,Geronimo,Stilton,Bill,Lobely
1,The Burning Maze,13 hrs and 8 mins,2018-05-01,English,4.5 out of 5 stars41 ratings,820.0,Rick,Riordan,Robbie,Daymond
2,The Deep End,2 hrs and 3 mins,2020-11-06,English,4.5 out of 5 stars38 ratings,410.0,Jeff,Kinney,Dan,Russell
3,Daughter of the Deep,11 hrs and 16 mins,2021-10-05,English,4.5 out of 5 stars12 ratings,615.0,Rick,Riordan,Soneela,Nankani
4,"The Lightning Thief: Percy Jackson, Book 1",10 hrs,2010-01-13,English,4.5 out of 5 stars181 ratings,820.0,Rick,Riordan,Jesse,Bernstein


In [108]:
audible_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87489 entries, 0 to 87488
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   name                 87489 non-null  object        
 1   time                 87489 non-null  object        
 2   releasedate          87489 non-null  datetime64[ns]
 3   language             87489 non-null  object        
 4   stars                87489 non-null  object        
 5   price (in Rupees)    87489 non-null  float64       
 6   author_first_name    53853 non-null  object        
 7   author_last_name     52398 non-null  object        
 8   narrator_first_name  62198 non-null  object        
 9   narrator_last_name   61071 non-null  object        
dtypes: datetime64[ns](1), float64(1), object(8)
memory usage: 6.7+ MB



- language to categorical
- and time stars ratings price to int or float 

In [110]:
audible_df['language'].value_counts()

language
English             61884
german               8295
spanish              3496
japanese             3167
italian              2694
french               2386
russian              1804
danish                935
portuguese            526
swedish               515
Hindi                 436
polish                224
finnish               197
dutch                 190
tamil                 161
catalan               153
mandarin_chinese       97
icelandic              52
romanian               50
hungarian              36
urdu                   34
afrikaans              28
czech                  23
turkish                20
greek                  18
arabic                 16
norwegian              16
galician               10
bulgarian               9
korean                  4
slovene                 4
hebrew                  2
basque                  2
telugu                  2
lithuanian              2
ukrainian               1
Name: count, dtype: int64

In [112]:
audible_df['language'] = audible_df['language'].astype(dtype='category')

In [119]:
audible_df['time'].value_counts().sample(15)

time
37 hrs and 3 mins       1
57 mins               166
4 hrs and 32 mins     109
16 hrs and 37 mins     11
27 hrs and 40 mins      3
1 hr and 32 mins       95
38 mins               188
16 hrs and 5 mins      15
4 hrs and 14 mins      83
33 hrs and 10 mins      2
10 hrs and 40 mins     82
8 hrs and 18 mins      99
17 hrs and 56 mins      8
11 hrs and 35 mins     54
11 hrs and 42 mins     59
Name: count, dtype: int64

In [135]:
def time_calculator(hour_time):
    hour_time = hour_time.split()
    if 'Less' in hour_time:
        return int(hour_time[2])
    elif 'hrs' in hour_time and 'mins' in hour_time:
        hours = int(hour_time[0])
        minutes = int(hour_time[3])
        return (hours * 60) + minutes
    else:
        return int(hour_time[0])
        
        
print(time_calculator('37 hrs and 3 mins'))
print(time_calculator('40 mins'))

2223
40


In [137]:
audible_df['time'] = audible_df['time'].apply(time_calculator)
audible_df.rename(columns={'time':'time (in minutes)'}, inplace=True)
audible_df.head()

Unnamed: 0,name,time (in minutes),releasedate,language,stars,price (in Rupees),author_first_name,author_last_name,narrator_first_name,narrator_last_name
0,Geronimo Stilton #11 & #12,140,2008-08-04,English,5 out of 5 stars34 ratings,468.0,Geronimo,Stilton,Bill,Lobely
1,The Burning Maze,788,2018-05-01,English,4.5 out of 5 stars41 ratings,820.0,Rick,Riordan,Robbie,Daymond
2,The Deep End,123,2020-11-06,English,4.5 out of 5 stars38 ratings,410.0,Jeff,Kinney,Dan,Russell
3,Daughter of the Deep,676,2021-10-05,English,4.5 out of 5 stars12 ratings,615.0,Rick,Riordan,Soneela,Nankani
4,"The Lightning Thief: Percy Jackson, Book 1",10,2010-01-13,English,4.5 out of 5 stars181 ratings,820.0,Rick,Riordan,Jesse,Bernstein


In [151]:
audible_df['stars'].value_counts().sample(10)

stars
4.5 out of 5 stars41 ratings     16
4.5 out of 5 stars747 ratings     1
5 out of 5 stars18 ratings       18
4.5 out of 5 stars62 ratings      9
5 out of 5 stars1,425 ratings     1
5 out of 5 stars35 ratings        2
4.5 out of 5 stars511 ratings     1
5 out of 5 stars91 ratings        1
4 out of 5 stars120 ratings       1
5 out of 5 stars159 ratings       1
Name: count, dtype: int64

In [170]:
def stars_split(row):
    
    if "Not" in row:
        return "Not rated yet", 0
    
    row = row.split()

    if ',' in row[4][5:]:
        row[4] = row[4].replace(',','')
    
    return float(row[0]), int(row[4][5:])
    
print(stars_split('Not rated yet'))
print(stars_split('4.5 out of 5 stars2 ratings'))
print(stars_split('5 out of 5 stars1,425 ratings'))

('Not rated yet', 0)
(4.5, 2)
(5.0, 1425)


In [171]:
audible_df['stars'].apply(stars_split)

0                 (5.0, 34)
1                 (4.5, 41)
2                 (4.5, 38)
3                 (4.5, 12)
4                (4.5, 181)
                ...        
87484    (Not rated yet, 0)
87485    (Not rated yet, 0)
87486    (Not rated yet, 0)
87487    (Not rated yet, 0)
87488    (Not rated yet, 0)
Name: stars, Length: 87489, dtype: object

In [172]:
audible_df[['stars_out_of_five','ratings']] = pd.DataFrame(audible_df['stars'].apply(stars_split).tolist(), index=audible_df.index)

In [174]:
audible_df.drop(columns='stars', inplace=True)
audible_df.head()

Unnamed: 0,name,time (in minutes),releasedate,language,price (in Rupees),author_first_name,author_last_name,narrator_first_name,narrator_last_name,stars_out_of_five,ratings
0,Geronimo Stilton #11 & #12,140,2008-08-04,English,468.0,Geronimo,Stilton,Bill,Lobely,5.0,34
1,The Burning Maze,788,2018-05-01,English,820.0,Rick,Riordan,Robbie,Daymond,4.5,41
2,The Deep End,123,2020-11-06,English,410.0,Jeff,Kinney,Dan,Russell,4.5,38
3,Daughter of the Deep,676,2021-10-05,English,615.0,Rick,Riordan,Soneela,Nankani,4.5,12
4,"The Lightning Thief: Percy Jackson, Book 1",10,2010-01-13,English,820.0,Rick,Riordan,Jesse,Bernstein,4.5,181
