## The effect of the _"Black Lives Matter"_ movement on the arrestation of African American people in L.A.

____

### Context _(TODO)_

...

In July 2013, after the acquittal of the neighborhood watch coordinator George Zimmerman in the shooting of the African American Trayvin Martin, the hashtag #BlackLivesMatter appeared for the first time on social media. This event launched the creation of the now worldwide known movement "Black Lives Matter" that aims to advocate against violence inflicted on Black communities.

___

### The data _(TODO)_


For our analysis, we will work on [Los Angeles Crime & Arrest Data](https://www.kaggle.com/cityofLA/los-angeles-crime-arrest-data?select=crime-data-from-2010-to-present.csv) which is provided by the City of Los Angeles. This dataset maps all the arrests that have been performed in LA between January 2010 and July 2019.

This dataset contains the following main variables:

- `Report ID`: ID of the arrest
- `Arrest date`: date of the arrest
- `Time`: an integer with the time of the arrest in 24 hour
- `Area ID`: refers to the geographic areas within the department
- `Area Name`: a string with the name of the area
- `Reporting District`: an integer that represents a sub-area within the geographic area
- `Age`: integer for the age of the arrested person
- `Sex code`: a string with F for Female and M for Male
- `Descent Code`: a string for each origin (among them, W stands for white, B for black, H for Hispanic/Latin/Mexican and O for other) 
- `Charge Group Code`: an integer corresponding to a category of arrest charge

Other variables are also present but will not be studied in the analysis:

- `Charge Group Code`
- `Charge Group Description`
- `Arrest Type Code`
- `Charge`
- `Charge Description`
- `Address`
- `Cross Street`
- `Location`
- `Zip Codes`
- `Census Tracts`
- `Precinct Boundaries`
- `LA Specific Plans`
- `Council Districts`
- `Neighborhood Councils (Certified`



___

In [1]:
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import statsmodels.api as sm
import pylab
from scipy import stats
from matplotlib.lines import Line2D


arrest_data = pd.read_csv('arrest-data-from-2010-to-present.csv', error_bad_lines = False)

Now that we've loaded our data, let's have a sneak peek at our dataframe

In [2]:
arrest_data.head()

Unnamed: 0,Report ID,Arrest Date,Time,Area ID,Area Name,Reporting District,Age,Sex Code,Descent Code,Charge Group Code,...,Charge Description,Address,Cross Street,Location,Zip Codes,Census Tracts,Precinct Boundaries,LA Specific Plans,Council Districts,Neighborhood Councils (Certified)
0,5666847,2019-06-22T00:00:00.000,1630.0,14,Pacific,1457,44,M,W,24.0,...,VANDALISM,12300 CULVER BL,,"{'latitude': '33.992', 'human_address': '{""add...",24031.0,918.0,1137.0,10.0,10.0,85.0
1,5666688,2019-06-22T00:00:00.000,1010.0,10,West Valley,1061,8,M,O,,...,,19000 VANOWEN ST,,"{'latitude': '34.1687', 'human_address': '{""ad...",19339.0,321.0,1494.0,,4.0,10.0
2,5666570,2019-06-22T00:00:00.000,400.0,15,N Hollywood,1543,31,F,O,22.0,...,DRUNK DRIVING ALCOHOL/DRUGS,MAGNOLIA AV,LAUREL CANYON BL,"{'latitude': '34.1649', 'human_address': '{""ad...",8890.0,205.0,1332.0,17.0,5.0,39.0
3,5666529,2019-06-22T00:00:00.000,302.0,17,Devonshire,1738,23,F,W,22.0,...,DRUNK DRIVING ALCOHOL/DRUGS,HAYVENHURST ST,N REGAN FY,"{'latitude': '34.2692', 'human_address': '{""ad...",19329.0,69.0,388.0,,2.0,78.0
4,5666742,2019-06-22T00:00:00.000,1240.0,14,Pacific,1472,28,M,W,8.0,...,OBSTRUCT/RESIST EXECUTIVE OFFICER,6600 ESPLANADE ST,,"{'latitude': '33.9609', 'human_address': '{""ad...",25075.0,937.0,241.0,10.0,10.0,16.0


For our analysis, it is not important to keep all columns. Only the ones regarding the race, the date of the arrest, the gender and the type of arrest (i.e. infraction or felony) are of interest to us.

In [None]:
#Picking the necessary columns and renaming them
arrest_data = arrest_data[['Arrest Date', 'Sex Code', 'Descent Code', 'Arrest Type Code']] \
                .rename(columns = {'Arrest Date' : 'Date', 'Sex Code': 'Gender', 'Descent Code': 'Race', 
                                   'Arrest Type Code': 'Arrest Type'})


#Date --> DateTime
arrest_data['Date'] = pd.to_datetime(arrest_data['Date'], errors='coerce')

#Verifying that no date is 'NaT' (Not a Time)
assert(arrest_data.Date.isnull().sum() == 0)

#Sanity checks
assert(datetime.datetime(2010,1,1) == min(arrest_data.Date))
assert(datetime.datetime(2019,6,22) == max(arrest_data.Date))

arrest_data.head()

___

## Question 1

_Since the creation of the Black Lives Matter movement, was there a change in the trend of the overall arrests of African American people in L.A.?_

Let's start by looking at the number of arrests per race, just to get an idea of what that looks like

In [None]:
race_data = arrest_data[['Date', 'Race']]

fig = plt.figure(figsize=(8,9))
ax = plot = sns.countplot(x='Race', data= race_data, palette = 'bright')
plt.title("Number of Arrests in L.A. w.r.t Race", pad = 20)
plt.ylabel('Number of Arrests', labelpad = 20)
plt.xlabel('Race', labelpad = 20)
plt.show()

#Getting count for each race that has a high number of arrests (> 5%)
counts = race_data.Race.value_counts()
black_counts = counts['B']
hispanic_counts = counts['H']
white_counts = counts['W']
others_counts = counts['O']

print('Number of Black people arrested (2010-2019) :  ' + str(black_counts))
print('Percentage: ' + str(round(black_counts/len(arrest_data)*100,3)) + "%\n")
print('Number of Hispanic people arrested (2010-2019) : ' + str(hispanic_counts))
print('Percentage: ' + str(round(hispanic_counts/len(arrest_data)*100,3)) + "%\n")
print('Number of White people arrested (2010-2019) : ' + str(white_counts))
print('Percentage: ' + str(round(white_counts/len(arrest_data)*100,3)) + "%\n")
print('Number of Others people arrested (2010-2019) : ' + str(others_counts))
print('Percentage: ' + str(round(others_counts/len(arrest_data)*100,3)) + "%\n")


We can already see that there is quite a decrepency in the number of arrests w.r.t the race.

The observed difference in the number of arrest per race could be due to a difference in the number of people of different race living in LA. To assess this, we will scale the data.

In [None]:
# plot only the 3 main race (B, W, H), multiply the number of arrest of each race by the percentage of the race in the population
# 76.3% for W, 13.4% for B, 18.5% for H (https://www.census.gov/quickfacts/fact/table/US/PST045219)

arrest_data_scaled = arrest_data[(arrest_data.Race == 'W') | (arrest_data.Race == 'B')| (arrest_data.Race == 'H')]

Now, let's look at how the creation of the Black Lives Matter movement affected the number of arrest. We will start by counting the number of African American people arrested before and after July, 2013 (from January 2010 to December 2016, i.e. 3.5 years before and after the creation of the black lives matter).

We will begin by looking at "non-model" empirical findings, without using a statistical model. The mean number of arrests of African American before and after the creation of the movement will be compared. 

#### Non-model empirical findings

In [None]:
black_data = race_data[race_data['Race'] == 'B']
movement_creation_date = datetime.datetime(2013, 8, 1)
black_before_movement = black_data[black_data['Date'] < movement_creation_date].assign(Period = "Before")
black_after_movement = black_data[(black_data['Date'] >= movement_creation_date) & 
                                  (black_data['Date'] <= datetime.datetime(2016,12,31))].assign(Period = "After")

fig = plt.figure(figsize=(6,6))
ax = plot = sns.countplot(x='Period', data= pd.concat([black_before_movement, black_after_movement]), palette = 'bright')
plt.title("Number of Arrests of African American people before and after the creation of Black Lives Matter", pad = 20)
plt.ylabel('Number of Arrests', labelpad = 20)
plt.xlabel('Period', labelpad = 20)
plt.show()


At first glance, it would seem that the creation of the Black Lives Matter movement had a clear impact on the number of arrests of people of African American descent.  

In [None]:
print('Number of Arrests Before : ' + str(len(black_before_movement)))
print('Number of Arrests After  : ' + str(len(black_after_movement)))
print('Decrease percentage : ' + str(round((len(black_before_movement)- len(black_after_movement))/len(black_before_movement)*100,2)) + "%")

Indeed, we can observe a  reduction of 37'651 arrests that corresponds approximately to a drop of 22.86% in the arrests of African American after July 2013. 

In [None]:
#TO add: QQplot to check normality and t-test to see if this reduction is statistically significant

To better analyse the arrests trends per month, a model-based empirical analysis is required. To do so, a segmented regression analysis is performed. We will perform the segmented regression of the data obtained with July 2013 (date of the creation of the Black Lives Matter movement) as the "interruption" element.

We will start by visualizing this regression analysis.

#### Regression analysis visualization

In [None]:
black_trends_before = black_before_movement.set_index('Date') \
                        .replace('B', 1) \
                        .rename(columns = {'Race' : 'Arrestations'}) \
                        .groupby(pd.Grouper(freq = 'M')).sum() \
                        .assign(Period = 'Before')

black_trends_after = black_after_movement.set_index('Date') \
                        .replace('B', 1) \
                        .rename(columns = {'Race' : 'Arrestations'}) \
                        .groupby(pd.Grouper(freq = 'M')).sum() \
                        .assign(Period = 'After')

black_trends = pd.concat([black_trends_before, black_trends_after]).reset_index(drop = True)
black_trends['Time'] = black_trends.index + 1

**Remark :** The first step is to prepare the dataset that will be used for the visualization. The number of arrests of African Americans is aggregated by months and the parameter `Period` is used to label the data (to distinguish between arrests before and after July 2013).

In [None]:
#Showing the results of the segmented linear regression
ax = sns.lmplot(x="Time", y="Arrestations", hue = "Period", data=black_trends, ci=95, palette="bright", height = 12)
ax._legend.remove()
plt.subplots_adjust(top=0.9)
ax.fig.suptitle('Pre and Post July 2013 Arrestations Trends of African American people', fontsize = 14)
ax.set_axis_labels("Time (Months)", "Total Arrestations", fontsize = 13, labelpad = 10)
plt.axvline(x = 43.5, color = 'red', alpha = 2, linestyle = '--')
plt.xlim(0, 86)
plt.ylim(2200, None)
ax.set(xticks = range(0, 92, 4))
#Constructing the legend
legend_elements = [Line2D([0], [0], marker='o', color='w', label='Total Arrestations by month (Before)', markerfacecolor='blue', markersize=7.5),
                   Line2D([0], [0], marker='o', color='w', label='Total Arrestations by month (After)', markerfacecolor='darkorange', markersize=7.5),
                   Line2D([0], [0], color='blue', label = "Trend Pre-June 2013"), 
                   Line2D([0], [0], color='darkorange', label = "Trend Post-June 2013"), 
                   Line2D([0], [0], color='red', linestyle = '--', alpha = 2 , label = "Mid July 2013 - Creation of Black Lives Matter")]

plt.legend(handles=legend_elements)


plt.show()

This graph includes the trend of the results of the regression analysis and a scatter plot of the data.

In this graph, one can observe a large and immediate drop in the number of arrests right after the creation of the movement. A large difference in the trend of the data can also be observed.

Let's support these findings with numerical results.

#### Numerical Segmented regression

The equation of a segment regression analysis can be seen as follow:

$Y_{t} = \beta_{0} + \beta_{1}\times time + \beta_{2}\times intervention +  \beta_{3}\times postslope + \epsilon_{1}^{111}  $ 

In our case, the varialbe $Y_{t}$ is the monthly aggregate of the arrests of African American people. 

Let's clarify the meaning of the parameters in the above-mentioned equation:

- $\beta_{0}$ represents the baseline level of $Y_t$ at the beginning of the study (meaning, the total number of arrests during the first month)
- $\beta_{1}$ represents the slope of the trend independently from the intervention event (meaning the growh rate of the arrests of African Americans independently from the creation of Black Lives Matter)
- $\beta_{2}$ represents the change in the number of arrests; it measures and represents the effect of the intervention event (i.e. the Black Lives Matter creation)
- $\beta_{3}$ represents the change in the overall trend after the event. It is used to assess whether there is a raise or a decline in the number of arrests of African Americans



Les't begin by constructing the table that will be used:

In [None]:
black_data_regression = black_trends.copy()
black_data_regression['x2'] = black_data_regression['Period'].apply(lambda x : 0 if x == 'Before' else 1)
black_data_regression['x3'] = black_data_regression['Time'].apply(lambda x : x-43 if x>43 else 0)
y_df = black_data_regression['Arrestations']
black_data_regression = black_data_regression.drop(columns = ['Arrestations','Period']).rename(columns = {'Time':'x1'})
black_data_regression

**Remark:** The following table contains the three variables that will be used to perform the segmented regression analysis that follows the equation presented above.
In our case, the column `x1` of the dataframe corresponds to the variable $\text{time}$ of the equation, the column `x2` corresponds to the variable $\text{intervention}$ and the column `x3` corresponds to the variable $\text{postslope}$.

Now, let's fit the regression model:

In [None]:
X_df = black_data_regression
X_df = sm.add_constant(X_df.values)
model = sm.OLS(y_df, X_df).fit()
model.summary()

## To add: analyze and conclude on the table

---

## Question 2

How does the evolution of the trend of arrests of the other races differ from those of the African American people ?

In the first part of our analysis, we found a decrease in the overall number of arrests of African Americans after July 2013, date of the creation of the Black Lives Matter movement. To ensure that this reduction is linked with this disruptive event, a control group should be added to the analysis. 

We decided to use as a control group White people living in L.A.. Indeed, the Black Lives Matter movement aims to eradicate white supremacy. Thus, the creation of the movement should not influence the arrestation trend of white people.

#### Non-model empirical findings

As for the analysis of African Americans, we begin with a non-model analysis.

Let's start by preparing the data to analyse the arrestations of white people.

In [None]:
white_data = race_data[race_data['Race'] == 'W']
white_before_movement = white_data[white_data['Date'] < movement_creation_date].assign(Period = "Before")
white_after_movement = white_data[(white_data['Date'] >= movement_creation_date) & 
                                  (white_data['Date'] <= datetime.datetime(2016,12,31))].assign(Period = "After")

fig = plt.figure(figsize=(6,6))
ax = plot = sns.countplot(x='Period', data= pd.concat([white_before_movement, white_after_movement]), palette = 'bright')
plt.title("Number of Arrests of White people before and after the creation of Black Lives Matter", pad = 20)
plt.ylabel('Number of Arrests', labelpad = 20)
plt.xlabel('Period', labelpad = 20)
plt.show()

In [None]:
print('Number of Arrests Before : ' + str(len(white_before_movement)))
print('Number of Arrests After  : ' + str(len(white_after_movement)))
print('Decrease percentage : ' + str(round((len(white_before_movement)- len(white_after_movement))/len(white_before_movement)*100,2)) + "%")

One can observe a similar decrease in the number of arrests.

#### Regression analysis visualization

Let's visualize the regression analysis to have a better insight.

In [None]:
white_trends_before = white_before_movement.set_index('Date') \
                        .replace('W', 1) \
                        .rename(columns = {'Race' : 'Arrestations'}) \
                        .groupby(pd.Grouper(freq = 'M')).sum() \
                        .assign(Period = 'Before')

white_trends_after = white_after_movement.set_index('Date') \
                        .replace('W', 1) \
                        .rename(columns = {'Race' : 'Arrestations'}) \
                        .groupby(pd.Grouper(freq = 'M')).sum() \
                        .assign(Period = 'After')

white_trends = pd.concat([white_trends_before, white_trends_after]).reset_index(drop = True)
white_trends['Time'] = white_trends.index + 1

**Remark :** We start by selecting the data of interest (i.e. the arrestations of white people) and aggregating it on a monthly basis.

To plot the regression, we create a dataset that contains the arrests of both black and white people. A `label_values` function is defined to set labels to the obtained data. It will be used to discriminate between arrests of white and black people, before and after July 2013.

In [None]:
def label_values(x):
    if x <= 42:
        return 'Trend of black arrests before July 2013'
    elif x <= 83:
        return 'Trend of black arrests after July 2013'
    elif x <= 126:
        return 'Trend of white arrests before July 2013'
    else:
        return 'Trend of white arrests after July 2013'

In [None]:
control_trends = black_trends.copy()
control_trends = control_trends.append(white_trends)
control_trends.reset_index(inplace = True)

#Defining the labels
control_trends['Label'] = control_trends.index.values
control_trends['Label'] = control_trends['Label'].apply(label_values)

In [None]:
ax = sns.lmplot(x="Time", y="Arrestations", hue = "Label", data=control_trends, ci=95, palette="bright", height = 12)
ax._legend.remove()
plt.subplots_adjust(top=0.9)
ax.fig.suptitle('Pre and Post July 2013 Arrestations Trends of African American people', fontsize = 14)
ax.set_axis_labels("Time (Months)", "Total Arrestations", fontsize = 13, labelpad = 10)
plt.axvline(x = 43.5, color = 'red', alpha = 2, linestyle = '--')
plt.xlim(0, 86)
plt.ylim(1000, None)
ax.set(xticks = range(0, 92, 4))
#Constructing the legend
legend_elements = [Line2D([0], [0], marker='o', color='w', label='Total Arrestations of Black poeple by month (Before)', markerfacecolor='blue', markersize=7.5),
                   Line2D([0], [0], marker='o', color='w', label='Total Arrestations of Black people month (After)', markerfacecolor='darkorange', markersize=7.5),
                   Line2D([0], [0], marker='o', color='w', label='Total Arrestations of White people by month (Before)', markerfacecolor='green', markersize=7.5),
                   Line2D([0], [0], marker='o', color='w', label='Total Arrestations of White people by month (After)', markerfacecolor='red', markersize=7.5),
                   Line2D([0], [0], color='blue', label = "Trend of Black Arrests Pre-June 2013"), 
                   Line2D([0], [0], color='darkorange', label = "Trend of Black Arrests Post-June 2013"), 
                   Line2D([0], [0], color='green', label = "Trend of White Arrests Pre-June 2013"), 
                   Line2D([0], [0], color='red', label = "Trend of White Arrests Post-June 2013"), 
                   Line2D([0], [0], color='red', linestyle = '--', alpha = 2 , label = "Mid July 2013 - Creation of Black Lives Matter")]

plt.legend(handles=legend_elements)


plt.show()

As before, the graph of the segmented regression analysis includes the trend of the results of the analysis and a scatter plot of the data. This graphs represents the trend of arrests of Black people and the trend of arrests of White people (the control group).

In this graph, one can observe a similar trend for the arrest of white and black people after July 2013. These findings are unconsistents with our hypothesis. Before concluding, let's perform a numerical segmented regression.

#### Numerical Segmented regression

Here, we apply a similar method to the one used to perform the numerical segment regression for black people arrests.

In [None]:
white_data_regression = white_trends.copy()
white_data_regression['x2'] = white_data_regression['Period'].apply(lambda x : 0 if x == 'Before' else 1)
white_data_regression['x3'] = white_data_regression['Time'].apply(lambda x : x-43 if x>43 else 0)
y_df = white_data_regression['Arrestations']
white_data_regression = white_data_regression.drop(columns = ['Arrestations','Period']).rename(columns = {'Time':'x1'})

X_df = white_data_regression
X_df = sm.add_constant(X_df.values)
model = sm.OLS(y_df, X_df).fit()
model.summary()

These results show also a reduction in the trend of arrests of white people after July 2013.

Whith a closer look on the political history of L.A. we found that in July 2013, Eric Marcetti began his term as L.A. mayer. Mayor Garcetti aimed at making L.A. a safer place. This could explain the overall decrease in arrestations (across all races) after July 2013.

---

### Question 3

Is there a difference in the number of arrestations between males and females African Americans ? What about after the Black Lives Matter movement ?

In [None]:
gender_black_data = arrest_data[arrest_data['Race'] == 'B'][['Date', 'Gender']].reset_index(drop = True)

fig = plt.figure(figsize=(6,6))
ax = plot = sns.countplot(x='Gender', data=gender_black_data, palette = 'bright')
plt.title("Number of Arrests of African American people w.r.t gender (Jan 2010 - Jun 2019)", pad = 20)
plt.ylabel('Number of Arrests', labelpad = 20)
plt.xlabel('Gender', labelpad = 20)
plt.show()

In [None]:
gender_black_data['Period'] = gender_black_data['Date'].apply(lambda x: 'Before' if x <= movement_creation_date else 'After')
gender_black_data = gender_black_data[gender_black_data['Date'] <= datetime.datetime(2016,12,31)].reset_index(drop = True).sort_values(by='Date')

fig = plt.figure(figsize=(6,6))
ax = plot = sns.countplot(x='Gender', hue = 'Period',data=gender_black_data, palette = 'muted')
plt.title("Number of Arrests of African American people w.r.t gender (Jan 2010 - Jun 2019)", pad = 20)
plt.ylabel('Number of Arrests', labelpad = 20)
plt.xlabel('Gender', labelpad = 20)
plt.show()

Segmented regression analysis

In [None]:
female_black_data = gender_black_data[gender_black_data['Gender'] == 'F'] \
                    .set_index('Date') \
                    .replace('F', 1) \
                    .rename(columns = {'Gender' : 'Arrestations'}) \
                    .groupby(pd.Grouper(freq = 'M')).sum() \
                    .reset_index() \
                    .sort_values(by = 'Date')

female_black_data_before = female_black_data[female_black_data['Date'] < datetime.datetime(2013, 8, 1)] \
                            .assign(Period = 'Before Female') \
                            .assign(Time = lambda x : x.index + 1)

female_black_data_after = female_black_data[female_black_data['Date'] >= datetime.datetime(2013, 8, 1)] \
                            .assign(Period = 'After Female') \
                            .assign(Time = lambda x : x.index + 1)


male_black_data = gender_black_data[gender_black_data['Gender'] == 'M'] \
                    .set_index('Date') \
                    .replace('M', 1) \
                    .rename(columns = {'Gender' : 'Arrestations'}) \
                    .groupby(pd.Grouper(freq = 'M')).sum() \
                    .reset_index() \
                    .sort_values(by = 'Date')

male_black_data_before = male_black_data[male_black_data['Date'] < datetime.datetime(2013, 8, 1)] \
                            .assign(Period = 'Before Male') \
                            .assign(Time = lambda x : x.index + 1)

male_black_data_after = male_black_data[male_black_data['Date'] >= datetime.datetime(2013, 8, 1)] \
                            .assign(Period = 'After Male') \
                            .assign(Time = lambda x : x.index + 1)

In [None]:
ax = sns.lmplot(x="Time", y="Arrestations", hue = "Period", data=pd.concat([male_black_data_before, female_black_data_before, male_black_data_after, female_black_data_after]),
                ci=95, palette="bright", height = 12)

plt.subplots_adjust(top=0.9)
ax._legend.remove()
ax.fig.suptitle('Pre and Post July 2013 Arrestations Trends of African American people', fontsize = 14)
ax.set_axis_labels("Time (Months)", "Total Arrestations", fontsize = 13, labelpad = 10)
plt.axvline(x = 43.5, color = 'red', alpha = 2, linestyle = '--')
plt.xlim(0, 86)
plt.ylim(500, None)
ax.set(xticks = range(0, 92, 4))

plt.legend()


plt.show()

Male Segmented regression

In [None]:
male_trends = pd.concat([male_black_data_before, male_black_data_after])
male_trends['x2'] = male_trends.Period.apply(lambda x : 0 if x=='Before Male' else 1)
male_trends['x3'] = male_trends.Time.apply(lambda x: x-43 if x>43 else 0)
y_df = male_trends['Arrestations']
male_trends = male_trends.drop(columns = ['Date', 'Period', 'Arrestations']).rename(columns = {'Time':'x1'})

In [None]:
X_df = male_trends
X_df = sm.add_constant(X_df.values)
model = sm.OLS(y_df, X_df).fit()
model.summary()

In [None]:
female_trends = pd.concat([female_black_data_before, female_black_data_after])
female_trends['x2'] = female_trends.Period.apply(lambda x : 0 if x=='Before Female' else 1)
female_trends['x3'] = female_trends.Time.apply(lambda x: x-43 if x>43 else 0)
y_df = female_trends['Arrestations']
female_trends = female_trends.drop(columns = ['Date', 'Period', 'Arrestations']).rename(columns = {'Time':'x1'})

In [None]:
X_df = female_trends
X_df = sm.add_constant(X_df.values)
model = sm.OLS(y_df, X_df).fit()
model.summary()