# Book Analysis

This notebook explores a dataset of book details scraped from [Tor](https://publishing.tor.com/).

## Imports

In [1]:
from os import environ

from dotenv import load_dotenv
import pandas as pd
import altair as alt

## Setup

In [2]:
load_dotenv()

pd.set_option("max_colwidth", 150)  # Display more text

## Data sourcing

In [26]:
def string_to_list(string: str) -> list[str]:
    """Returns a list from a list stored as a string."""
    return string[1:-1].split(", ")

In [27]:
books = pd.read_csv(environ["FINAL_BOOK_FILEPATH"], parse_dates=["publication_date"],
                    converters={"formats": string_to_list, "contributors": string_to_list})

In [28]:
books.sample(3)

Unnamed: 0,title,description,series,series_number,pages,publication_date,formats,contributors
119,The Heirs of Locksley,"Carrie Vaughn follows up The Ghosts of Sherwood with the charming, fast-paced The Heirs of Locksley, continuing the story of Robin Hood's childre...",,,128.0,2020-08-04,"['e-Book', 'Trade Paperback']",['Carrie Vaughn']
14,Anthropocene Rag,"Anthropocene Rag is ""a rare distillation of nanotech, apocalypse, and mythic Americana into a heady psychedelic brew."" — Nebula and World Fantasy...",,,256.0,2020-03-31,"['e-Book', 'Trade Paperback']",['Alex Irvine']
11,American Hippo,"In 2017 Sarah Gailey made her debut with River of Teeth and Taste of Marrow , two action-packed novellas that introduced readers to an alternate ...",River of Teeth,0.0,256.0,2018-05-22,"['e-Book', 'Trade Paperback']",['Sarah Gailey']


## Data exploration

**How many books are missing descriptions or page numbers? Can you work out why?**

**How many books are not part of a series**?

**What is the name of the longest series?**

**How many books were published in each month of the year (bar chart)?**

**What's the average number of pages?**

**What proportion of books have more than one author (pie chart)?**

**How many books were published each year (line chart)?**