## Mini Project

### Introduction

Apple provides an api to get some useful information from iTunes and Apple Book Store content. 
You can read more details about this service <a href="https://affiliate.itunes.apple.com/resources/documentation/itunes-store-web-service-search-api/">here</a>. In this mini project, we will use the mentioned service to get info. about the first 50 audio books that talks about data from Apple store.

### Step .1
- Fetch the data from using the following URL: https://itunes.apple.com/search?term=data&country=us&entity=audiobook&limit=50

In [None]:
# Your Code Here
import requests
import json
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://itunes.apple.com/search?term=data&country=us&entity=audiobook&limit=50'
response = requests.get(url)
json_data = json.loads(response.text)
df = pd.json_normalize(json_data, 'results')

### Step .2
- Save these data into a DataFrame that should look as follows (extract the required data only):

|    | artistName    | collectionName                                                                                         | releaseDate          | primaryGenreName            |   collectionPrice | collectionViewUrl                                                                                         |
|---:|:--------------|:-------------------------------------------------------------------------------------------------------|:---------------------|:----------------------------|------------------:|:----------------------------------------------------------------------------------------------------------|
|  0 | Emily Oster   | Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting, from Birth to Preschool (Unabridged) | 2019-04-23T07:00:00Z | Nonfiction                  |             14.99 | https://books.apple.com/us/audiobook/cribsheet-a-data-driven-guide-to-better-more/id1459654300?uo=4       |
|  1 | George Gilder | Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy                         | 2018-07-17T07:00:00Z | Business & Personal Finance |             14.99 | https://books.apple.com/us/audiobook/life-after-google-the-fall-of-big-data-and/id1454667331?uo=4         |
|  .. | ...  | ...    | ... | ...  | ... | ... |
| 49 | William Sullivan | SQL Data Warehouse Database Management, SQL Server, Structured Query Language, Business Intelligence, Data Models: Master SQL Programming (Unabridged) | 2018-05-14T07:00:00Z | Science & Nature   |              5.99 | https://books.apple.com/us/audiobook/sql-data-warehouse-database-management-sql-server-structured/id1385009049?uo=4 |

In [None]:
# Your Code Here
sub_df = df[['artistName', 'collectionName', 'releaseDate', 'primaryGenreName', 'collectionPrice', 'collectionViewUrl']]
sub_df.head()

Unnamed: 0,artistName,collectionName,releaseDate,primaryGenreName,collectionPrice,collectionViewUrl
0,Emily Oster,"Cribsheet: A Data-Driven Guide to Better, More...",2019-04-23T07:00:00Z,Nonfiction,14.99,https://books.apple.com/us/audiobook/cribsheet...
1,Cathy O'Neil,Weapons of Math Destruction: How Big Data Incr...,2016-09-06T07:00:00Z,Nonfiction,14.99,https://books.apple.com/us/audiobook/weapons-o...
2,Gene Kim,"The Unicorn Project: A Novel About Developers,...",2019-11-26T08:00:00Z,Business & Personal Finance,21.99,https://books.apple.com/us/audiobook/the-unico...
3,Charles Wheelan,Naked Statistics: Stripping the Dread from the...,2013-04-23T07:00:00Z,Nonfiction,21.99,https://books.apple.com/us/audiobook/naked-sta...
4,Emily Oster,The Family Firm: A Data-Driven Guide to Better...,2021-08-03T07:00:00Z,Nonfiction,14.99,https://books.apple.com/us/audiobook/the-famil...


### Step .3
- Rename the dataframe columns to be: `'author','collectionName','date','genre','price','url'`

In [None]:
# Your Code Here
named_df = sub_df.rename(columns={
    'artistName': 'author',
    'collectionName': 'collectionName',
    'releaseDate': 'date',
    'primaryGenreName': 'genre',
    'collectionPrice': 'price',
    'collectionViewUrl': 'url'
})
named_df.head()

Unnamed: 0,author,collectionName,date,genre,price,url
0,Emily Oster,"Cribsheet: A Data-Driven Guide to Better, More...",2019-04-23T07:00:00Z,Nonfiction,14.99,https://books.apple.com/us/audiobook/cribsheet...
1,Cathy O'Neil,Weapons of Math Destruction: How Big Data Incr...,2016-09-06T07:00:00Z,Nonfiction,14.99,https://books.apple.com/us/audiobook/weapons-o...
2,Gene Kim,"The Unicorn Project: A Novel About Developers,...",2019-11-26T08:00:00Z,Business & Personal Finance,21.99,https://books.apple.com/us/audiobook/the-unico...
3,Charles Wheelan,Naked Statistics: Stripping the Dread from the...,2013-04-23T07:00:00Z,Nonfiction,21.99,https://books.apple.com/us/audiobook/naked-sta...
4,Emily Oster,The Family Firm: A Data-Driven Guide to Better...,2021-08-03T07:00:00Z,Nonfiction,14.99,https://books.apple.com/us/audiobook/the-famil...


### Step .4
- Your dataFrame doesn't contain some importat data. such as (`the name of the narrator`, and `the length of the audiobook`).
So add this data to each book in the dataframe, by fetching and extracting it from the audiobook `url` that exist in your dataframe for each book

**Hints:** 
   - Write code to visit each url, find the div with the class `book-badge__value` in that url, then you will get all these values in a list.
   - You can use `df[col_name].apply(function_name)` to apply certain function on each value in a certain column


In [None]:
# Your Code Here
url_list = named_df['url']
narrator_list = []
length_list = []
for url in url_list:
    response = requests.get(url)
    soup = BeautifulSoup(response.text)
    value_div = soup.find_all('div', {'class': 'book-badge__value'})
    i = 0
    for element in value_div:
        if i == 0:
            narrator = element.text.strip()
            narrator_list.append(narrator)
        elif i == 2:
            length = element.text.strip()
            length_list.append(length)
        i += 1

In [None]:
named_df['narrator'] = narrator_list
named_df['length'] = length_list
named_df.head()

Unnamed: 0,author,collectionName,date,genre,price,url,narrator,length
0,Emily Oster,"Cribsheet: A Data-Driven Guide to Better, More...",2019-04-23T07:00:00Z,Nonfiction,14.99,https://books.apple.com/us/audiobook/cribsheet...,KV,10:36
1,Cathy O'Neil,Weapons of Math Destruction: How Big Data Incr...,2016-09-06T07:00:00Z,Nonfiction,14.99,https://books.apple.com/us/audiobook/weapons-o...,CO,06:23
2,Gene Kim,"The Unicorn Project: A Novel About Developers,...",2019-11-26T08:00:00Z,Business & Personal Finance,21.99,https://books.apple.com/us/audiobook/the-unico...,FC,12:25
3,Charles Wheelan,Naked Statistics: Stripping the Dread from the...,2013-04-23T07:00:00Z,Nonfiction,21.99,https://books.apple.com/us/audiobook/naked-sta...,JD,10:49
4,Emily Oster,The Family Firm: A Data-Driven Guide to Better...,2021-08-03T07:00:00Z,Nonfiction,14.99,https://books.apple.com/us/audiobook/the-famil...,EO,07:56


### Step .5
- In the column date keep only the year
- Convert `year` into `int` datatype and `price` to `float` datatype
- Remove the column `url` from the dataframe as it is not required anymore

Your dataFrame should look as follows:

|    | author        | collectionName                                                                                         |   date | genre                       |   price | narrator   | length   |
|---:|:--------------|:-------------------------------------------------------------------------------------------------------|-------:|:----------------------------|--------:|:-----------|:---------|
|  0 | Emily Oster   | Cribsheet: A Data-Driven Guide to Better, More Relaxed Parenting, from Birth to Preschool (Unabridged) |   2019 | Nonfiction                  |   14.99 | KV         | 10:36    |
|  1 | George Gilder | Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy                         |   2018 | Business & Personal Finance |   14.99 | EMS        | 09:38    |
|  .. | ...  | ...    | ... | ...  | ... | 
| 49 | William Sullivan | SQL Data Warehouse Database Management, SQL Server, Structured Query Language, Business Intelligence, Data Models: Master SQL Programming (Unabridged) |   2018 | Science & Nature |    5.99 | LH         | 02:49    |

In [None]:
# Your Code Here
date = pd.to_datetime(named_df['date'])
date_int = [int(x.year) for x in date]

price = named_df['price']
price_float = [float(x) for x in price]

named_df = named_df.drop('url', axis=1)
named_df['date'] = date_int
named_df['price'] = price_float
named_df.head()

Unnamed: 0,author,collectionName,date,genre,price,narrator,length
0,Emily Oster,"Cribsheet: A Data-Driven Guide to Better, More...",2019,Nonfiction,14.99,KV,10:36
1,Cathy O'Neil,Weapons of Math Destruction: How Big Data Incr...,2016,Nonfiction,14.99,CO,06:23
2,Gene Kim,"The Unicorn Project: A Novel About Developers,...",2019,Business & Personal Finance,21.99,FC,12:25
3,Charles Wheelan,Naked Statistics: Stripping the Dread from the...,2013,Nonfiction,21.99,JD,10:49
4,Emily Oster,The Family Firm: A Data-Driven Guide to Better...,2021,Nonfiction,14.99,EO,07:56


### Step .6
- Find the number of audiobooks pubished before year 2015


In [None]:
# Your Code Here
count = len([x for x in named_df['date'] if x < 2015])
count

4

### Step .7
- Find the name of the author who has the largest number of audiobooks
- Find the all audiobooks of that author and show their genre

In [None]:
# Your Code Here
author_counts = named_df['author'].value_counts()
max_author = author_counts.idxmax()
max_author

'Emily Oster'

In [None]:
max_author_df = named_df[named_df['author'] == max_author]
max_author_df[['collectionName', 'genre']]

Unnamed: 0,collectionName,genre
0,"Cribsheet: A Data-Driven Guide to Better, More...",Nonfiction
4,The Family Firm: A Data-Driven Guide to Better...,Nonfiction


### Step .8
- Find the title of the most expensive collection
- Find the name of the narrator who has the longest audiobook
- Find the number of diffenet genres

In [None]:
# Your Code Here
max_price_idx = named_df['price'].idxmax()
most_expensive_title = named_df.loc[max_price_idx, 'collectionName']
most_expensive_title

'Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems (Unabridged)'

In [None]:
length_list = [int(x.replace(':', '')) for x in named_df['length']]
max_length_idx = length_list.index(max(length_list))
longest_audiobook_narrator = named_df.loc[max_length_idx, 'narrator']
longest_audiobook_narrator

'IB'

In [None]:
len(named_df['genre'].unique())

6

### Step .9
- Compare the mean price of different genre
- Compare the mean price of different years

In [None]:
# Your Code Here
genre_mean_price = named_df.groupby('genre')['price'].mean()
genre_mean_price

genre
Biographies & Memoirs          12.990000
Business & Personal Finance    14.132857
Mysteries & Thrillers          11.990000
Nonfiction                     16.190000
Science & Nature               15.190000
Self-Development               23.990000
Name: price, dtype: float64

In [None]:
year_mean_price = named_df.groupby('date')['price'].mean()
year_mean_price

date
2013    21.990000
2014     5.323333
2015    10.490000
2016    13.434444
2017    13.490000
2018    16.740000
2019    15.590000
2020    18.390000
2021    17.865000
2022    15.156667
2023    18.323333
Name: price, dtype: float64

### Step .10
- Find the year that has the highest number of Nonfiction audiobooks

In [None]:
nonfiction_df = named_df[named_df['genre'] == 'Nonfiction']
nonfiction_year_counts = nonfiction_df.groupby('date').size()
max_nonfiction_year = nonfiction_year_counts.idxmax()
max_nonfiction_year

2016