# BechdelAI tutorial #1 - Fetching data from bechdeltest.com

> **Objective of the tutorial**: learning how to fetch data from the website bechdeltest.com. This website references 10k+ movies and their score on the Bechdel test.  

In [1]:
# Classical imports 
import pandas as pd

In [2]:
# For developers who want to use the latest development version or the library locally
# Use poetry to install dependencies
import sys
sys.path.append("../") # Or change to the folder to the direction of 

import bechdelai

# About the Bechdel test

![](https://upload.wikimedia.org/wikipedia/en/b/bf/Dykes_to_Watch_Out_For_%28Bechdel_test_origin%29.jpg)

The Bechdel test is a simple test used to evaluate the representation of women in movies, TV shows, and other media. The test was popularized by the cartoonist Alison Bechdel in her comic strip "Dykes to Watch Out For" in 1985.

The test consists of three criteria:

1. The movie must have at least two named women in it.
2. The women must talk to each other.
3. They must talk about something other than a man.

The Bechdel test is not intended to be a definitive measure of a work's feminist content or quality. Instead, it is meant to be a starting point for a conversation about how women are portrayed in media. Some people argue that passing the Bechdel test is a low bar and that it doesn't necessarily mean that a movie is feminist or good, while others argue that failing the test is a clear indication of gender inequality in media.

Many movies, even popular ones, fail the Bechdel test, which has led to discussions about the representation of women in Hollywood and the media industry in general.

# Fetching data from bechdeltest.com

Some Bechdel tests are referenced in the website https://bechdeltest.com/ (around 10k movies are rated). 

We can fetch the data from the bechdeltest.com API to do various statistics.In this tutorial we will learn how to make the visualization below of the number of movies passing the test over the years. 

![](https://cdn.vox-cdn.com/thumbor/ThiG4oPhDKL1UJukOC_bFeB5JGQ=/1400x0/filters:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/3332436/hickey-bechdel-11.0.png)

In [None]:
# Importing the data helper functions
from bechdelai.data.bechdeltestcom import fetch_all_data

# Simply call the API and request all the data referenced in the website
data = fetch_all_data()

# Exploring the dataset

In [None]:
data.head()

Let's visualize the dataset to recreate the graph above

In [None]:
count_per_year = data.groupby(["year","rating"],as_index = False)["id"].count()
count_per_year.head()

We see that more movies are referenced in the past 20 years, and it looked like more movies have passed the test, but we would need to normalize per to see the evolution. 

In [None]:
import plotly.express as px

px.area(count_per_year,
        x = "year",
        y = "id",
        color = "rating",
        height = 300,
)

In [None]:
px.area(count_per_year,
        x = "year",
        y = "id",
        color = "rating",
        height = 400,
        groupnorm = "percent",
)

However, we don't have other information to contextualize the results (budget, genre, cast, ...). But we have the IMDB id of the movie to match it with a movie metadata database. That's what we are going to do in the next tutorial !