# Analyzing Books and Authors

You are a data scientist tasked with analyzing a large collection of books and authors. Your goal is to uncover insights about literary trends, author popularity, and book ratings. To achieve this, you need to determine the best data structure to store and manipulate your data efficiently.

## Objectives

- **Identify Popular Authors**: Determine which authors have the highest average ratings for their books.
- **Analyze Trends**: Discover trends in book genres, publication years, and author nationalities.
- **Calculate Metrics**: Compute various metrics such as the number of books per author, average rating per genre, and publication frequency over the years.

## Data Requirements

To accomplish these objectives, you need to store the following information for each book and author:

### Book Information:

- Title
- Author(s)
- Genre(s)
- Publication Year
- ISBN
- Average Rating
- Number of Ratings
- Number of Pages

### Author Information:

- Name
- Nationality
- Birth Year
- Death Year (if applicable)
- Total Number of Books
- Average Rating of Books


In [1]:
# One Book
author = "George R.R. Martin"
title = "A Game of Thrones"
print("Book '{}' by {}".format(title,author))

Book 'A Game of Thrones' by George R.R. Martin


In [2]:
# Using just lists to print authors and titles of many books

# Lists for authors and titles

authors = ["George R.R. Martin", "J.K. Rowling", "Harper Lee", "Mark Twain", "Jane Austen", "F. Scott Fitzgerald"]
titles = ["A Game of Thrones", "Harry Potter and the Sorcerer's Stone", "To Kill a Mockingbird", 
          "The Adventures of Tom Sawyer", "Pride and Prejudice", "The Great Gatsby"]

for i in range(len(authors)):
    print(f"Book '{titles[i]}' by  {authors[i]}")

Book 'A Game of Thrones' by  George R.R. Martin
Book 'Harry Potter and the Sorcerer's Stone' by  J.K. Rowling
Book 'To Kill a Mockingbird' by  Harper Lee
Book 'The Adventures of Tom Sawyer' by  Mark Twain
Book 'Pride and Prejudice' by  Jane Austen
Book 'The Great Gatsby' by  F. Scott Fitzgerald


In [3]:
book_dict ={
'authors': ["George R.R. Martin", "J.K. Rowling", "Harper Lee", "Mark Twain", "Jane Austen", "F. Scott Fitzgerald"],
'titles' : ["A Game of Thrones", "Harry Potter and the Sorcerer's Stone", "To Kill a Mockingbird", 
          "The Adventures of Tom Sawyer", "Pride and Prejudice", "The Great Gatsby"]
}
for i in range(len(book_dict['authors'])):
    print(f"Book '{book_dict['titles'][i]}' by  {book_dict['authors'][i]}")

Book 'A Game of Thrones' by  George R.R. Martin
Book 'Harry Potter and the Sorcerer's Stone' by  J.K. Rowling
Book 'To Kill a Mockingbird' by  Harper Lee
Book 'The Adventures of Tom Sawyer' by  Mark Twain
Book 'Pride and Prejudice' by  Jane Austen
Book 'The Great Gatsby' by  F. Scott Fitzgerald


In [4]:
book_dict={
'authors' :["George R.R. Martin", "J.K. Rowling", "Harper Lee", "Mark Twain", "Jane Austen", "F. Scott Fitzgerald"],
'titles' : ["A Game of Thrones", "Harry Potter and the Sorcerer's Stone", "To Kill a Mockingbird", 
          "The Adventures of Tom Sawyer", "Pride and Prejudice", "The Great Gatsby"]
}

for i in range(len(authors)):
    print(titles[i],authors[i])
    

A Game of Thrones George R.R. Martin
Harry Potter and the Sorcerer's Stone J.K. Rowling
To Kill a Mockingbird Harper Lee
The Adventures of Tom Sawyer Mark Twain
Pride and Prejudice Jane Austen
The Great Gatsby F. Scott Fitzgerald


In [5]:
book_list=[
    {'authors': "George R.R. Martin", 'titles' : "A Game of Thrones" },
    {'authors':   "J.K. Rowling", 'titles' : "Harry Potter and the Sorcerer's Stone"},
    {'authors':   "Harper Lee", 'titles' : "To Kill a Mockingbird"},
    {'authors':   "Mark Twain", 'titles' : "The Adventures of Tom Sawyer"},
    {'authors': "Jane Austen", 'titles' : "Pride and Prejudice"},
    {'authors': "F. Scott Fitzgerald", 'titles' : "The Great Gatsby"}
]

for key in range(len(book_list)):
    print(book_list[key])

{'authors': 'George R.R. Martin', 'titles': 'A Game of Thrones'}
{'authors': 'J.K. Rowling', 'titles': "Harry Potter and the Sorcerer's Stone"}
{'authors': 'Harper Lee', 'titles': 'To Kill a Mockingbird'}
{'authors': 'Mark Twain', 'titles': 'The Adventures of Tom Sawyer'}
{'authors': 'Jane Austen', 'titles': 'Pride and Prejudice'}
{'authors': 'F. Scott Fitzgerald', 'titles': 'The Great Gatsby'}


In [6]:
book_list=[
    {'authors': "George R.R. Martin", 'titles' : "A Game of Thrones" },
    {'authors':   "J.K. Rowling", 'titles' : "Harry Potter and the Sorcerer's Stone"},
    {'authors':   "Harper Lee", 'titles' : "To Kill a Mockingbird"},
    {'authors':   "Mark Twain", 'titles' : "The Adventures of Tom Sawyer"},
    {'authors': "Jane Austen", 'titles' : "Pride and Prejudice"},
    {'authors': "F. Scott Fitzgerald", 'titles' : "The Great Gatsby"}
]

for key in book_list:
    print(key['authors'],key['titles'])

George R.R. Martin A Game of Thrones
J.K. Rowling Harry Potter and the Sorcerer's Stone
Harper Lee To Kill a Mockingbird
Mark Twain The Adventures of Tom Sawyer
Jane Austen Pride and Prejudice
F. Scott Fitzgerald The Great Gatsby


In [7]:
# Define the dictionary with lists as values
book_dict = {
    'authors': ["George R.R. Martin", "J.K. Rowling", "J.K. Rowling", "Harper Lee", "Mark Twain", "Jane Austen", "F. Scott Fitzgerald"],
    'titles': ["A Game of Thrones", "Harry Potter and the Sorcerer's Stone", "Harry Potter and the Chamber of Secrets", "To Kill a Mockingbird", "The Adventures of Tom Sawyer", "Pride and Prejudice", "The Great Gatsby"]
}

# Correct loop to iterate over the indices of the lists
for i in range(len(book_dict['authors'])):  # Ensure the loop runs over the length of the authors or titles list
    print("Book '{}' by {}".format(book_dict['titles'][i], book_dict['authors'][i]))


Book 'A Game of Thrones' by George R.R. Martin
Book 'Harry Potter and the Sorcerer's Stone' by J.K. Rowling
Book 'Harry Potter and the Chamber of Secrets' by J.K. Rowling
Book 'To Kill a Mockingbird' by Harper Lee
Book 'The Adventures of Tom Sawyer' by Mark Twain
Book 'Pride and Prejudice' by Jane Austen
Book 'The Great Gatsby' by F. Scott Fitzgerald


In [8]:
# Define a list of dictionaries, where each dictionary contains an author and a title
book_list = [
    {'author': 'George R.R. Martin', 'title': 'A Game of Thrones'},
    {'author': 'J.K. Rowling', 'title': "Harry Potter and the Sorcerer's Stone"},
    {'author': 'J.K. Rowling', 'title': "Harry Potter and the Chamber of Secrets"},
    {'author': 'Harper Lee', 'title': 'To Kill a Mockingbird'},
    {'author': 'Mark Twain', 'title': 'The Adventures of Tom Sawyer'},
    {'author': 'Jane Austen', 'title': 'Pride and Prejudice'},
    {'author': 'F. Scott Fitzgerald', 'title': 'The Great Gatsby'}
]

# Loop through each book in the list and print the title and author
for book in book_list:
    print("Book '{}' by {}".format(book['title'], book['author']))


Book 'A Game of Thrones' by George R.R. Martin
Book 'Harry Potter and the Sorcerer's Stone' by J.K. Rowling
Book 'Harry Potter and the Chamber of Secrets' by J.K. Rowling
Book 'To Kill a Mockingbird' by Harper Lee
Book 'The Adventures of Tom Sawyer' by Mark Twain
Book 'Pride and Prejudice' by Jane Austen
Book 'The Great Gatsby' by F. Scott Fitzgerald


In [9]:
# One Book instance

class Book:
    def __init__(self, author, title): # self is an object that will contain author and title
        self.author = author
        self.title = title
        
    def display(self):
        print(f"Book '{self.title}' by {self.author}")

#Instance of book is created
book1 = Book("George R.R. Martin", "A Game of Thrones")
book1.display()

Book 'A Game of Thrones' by George R.R. Martin


In [10]:
book1.author

'George R.R. Martin'

In [11]:
book1.title

'A Game of Thrones'

In [12]:
book1.display()

Book 'A Game of Thrones' by George R.R. Martin


In [13]:
id(book1)

2857842376272

In [14]:
# Multiple Book instances

class Book:
    def __init__(self, author, title): # self is an object that will contain author and title
        self.author = author
        self.title = title
        
    def display(self):
        print(f"Book '{self.title}' by {self.author}")

# Create instances of Book
book1 = Book("George R.R. Martin", "A Game of Thrones")
book2 = Book("J.K. Rowling", "Harry Potter and the Sorcerer's Stone")
book3 = Book("J.K. Rowling", "Harry Potter and the Chamber of Secrets")
book4 = Book("Harper Lee", "To Kill a Mockingbird")
book5 = Book("Mark Twain", "The Adventures of Tom Sawyer")
book6 = Book("Jane Austen", "Pride and Prejudice")
book7 = Book("F. Scott Fitzgerald", "The Great Gatsby")


#creating list of objects
books = [book1, book2, book3, book4, book5,book6, book7]
for book in books:
    book.display()

Book 'A Game of Thrones' by George R.R. Martin
Book 'Harry Potter and the Sorcerer's Stone' by J.K. Rowling
Book 'Harry Potter and the Chamber of Secrets' by J.K. Rowling
Book 'To Kill a Mockingbird' by Harper Lee
Book 'The Adventures of Tom Sawyer' by Mark Twain
Book 'Pride and Prejudice' by Jane Austen
Book 'The Great Gatsby' by F. Scott Fitzgerald


In [15]:
id(book1)

2857842528368

In [16]:
id(book2)

2857842516656

In [17]:
# Using dataframe
import pandas as pd

# Define a list of dictionaries, where each dictionary contains an author and a title
book_list = [
    {'author': 'George R.R. Martin', 'title': 'A Game of Thrones'},
    {'author': 'J.K. Rowling', 'title': "Harry Potter and the Sorcerer's Stone"},
    {'author': 'J.K. Rowling', 'title': "Harry Potter and the Chamber of Secrets"},
    {'author': 'Harper Lee', 'title': 'To Kill a Mockingbird'},
    {'author': 'Mark Twain', 'title': 'The Adventures of Tom Sawyer'},
    {'author': 'Jane Austen', 'title': 'Pride and Prejudice'},
    {'author': 'F. Scott Fitzgerald', 'title': 'The Great Gatsby'}
]

# Create a DataFrame from the list of dictionaries
books_df = pd.DataFrame(book_list)

# Print the DataFrame
print(books_df)

# Loop through each book in the DataFrame and print the title and author
for index, book in books_df.iterrows():
    print("Book '{}' by {}".format(book['title'], book['author']))


                author                                    title
0   George R.R. Martin                        A Game of Thrones
1         J.K. Rowling    Harry Potter and the Sorcerer's Stone
2         J.K. Rowling  Harry Potter and the Chamber of Secrets
3           Harper Lee                    To Kill a Mockingbird
4           Mark Twain             The Adventures of Tom Sawyer
5          Jane Austen                      Pride and Prejudice
6  F. Scott Fitzgerald                         The Great Gatsby
Book 'A Game of Thrones' by George R.R. Martin
Book 'Harry Potter and the Sorcerer's Stone' by J.K. Rowling
Book 'Harry Potter and the Chamber of Secrets' by J.K. Rowling
Book 'To Kill a Mockingbird' by Harper Lee
Book 'The Adventures of Tom Sawyer' by Mark Twain
Book 'Pride and Prejudice' by Jane Austen
Book 'The Great Gatsby' by F. Scott Fitzgerald


In [18]:
import pandas as pd

# Define the dictionary with lists as values
book_dict = {
    'authors': ["George R.R. Martin", "J.K. Rowling", "J.K. Rowling", "Harper Lee", "Mark Twain", "Jane Austen", "F. Scott Fitzgerald"],
    'titles': ["A Game of Thrones", "Harry Potter and the Sorcerer's Stone", "Harry Potter and the Chamber of Secrets", "To Kill a Mockingbird", "The Adventures of Tom Sawyer", "Pride and Prejudice", "The Great Gatsby"]
}

# Create a DataFrame from the dictionary
books_df = pd.DataFrame(book_dict)

# Print the DataFrame
print(books_df)

# Loop through each book in the dictionary and print the title and author
for i in range(len(book_dict['authors'])):  # Ensure the loop runs over the length of the authors or titles list
    print("Book '{}' by {}".format(book_dict['titles'][i], book_dict['authors'][i]))


               authors                                   titles
0   George R.R. Martin                        A Game of Thrones
1         J.K. Rowling    Harry Potter and the Sorcerer's Stone
2         J.K. Rowling  Harry Potter and the Chamber of Secrets
3           Harper Lee                    To Kill a Mockingbird
4           Mark Twain             The Adventures of Tom Sawyer
5          Jane Austen                      Pride and Prejudice
6  F. Scott Fitzgerald                         The Great Gatsby
Book 'A Game of Thrones' by George R.R. Martin
Book 'Harry Potter and the Sorcerer's Stone' by J.K. Rowling
Book 'Harry Potter and the Chamber of Secrets' by J.K. Rowling
Book 'To Kill a Mockingbird' by Harper Lee
Book 'The Adventures of Tom Sawyer' by Mark Twain
Book 'Pride and Prejudice' by Jane Austen
Book 'The Great Gatsby' by F. Scott Fitzgerald


Other way could be creating dataframes for both books and authors as:
import pandas as pd
### Example DataFrame for books
books_df = pd.DataFrame({
    'Title': [],
    'Author': [],
    'Genre': [],
    'Publication_Year': [],
    'ISBN': [],
    'Average_Rating': [],
    'Number_of_Ratings': [],
    'Number_of_Pages': []
})

### Example DataFrame for authors
authors_df = pd.DataFrame({
    'Name': [],
    'Nationality': [],
    'Birth_Year': [],
    'Death_Year': [],
    'Total_Number_of_Books': [],
    'Average_Rating': []
})


### For larger datasets, we can think of storing in SQL Database as:
    
CREATE TABLE books (
    book_id INTEGER PRIMARY KEY,
    title TEXT,
    author TEXT,
    genre TEXT,
    publication_year INTEGER,
    isbn TEXT,
    average_rating REAL,
    number_of_ratings INTEGER,
    number_of_pages INTEGER
);

CREATE TABLE authors (
    author_id INTEGER PRIMARY KEY,
    name TEXT,
    nationality TEXT,
    birth_year INTEGER,
    death_year INTEGER,
    total_number_of_books INTEGER,
    average_rating REAL
);


### Popular one is data in csv and loading into dataframe