# Hate Crime Address conversion notebook (do not run this again)

The [DC hate crime dataset](https://mpdc.dc.gov/node/1334781) only contains a free text street address from Washington, DC. This notebook contains the code and process to extract coordinates from the street address, and ultimately extract a zipcode, so that maps based on zipcode can be used. Many cleanup steps need to be taken on the address since it is free text and prone to errors or misspellings. The dataset also contains the police district, which we can visualize in maps, but the zipcodes give greater specificity and can be linked to other demographic information.

In [None]:
import pandas as pd
pd.options.mode.chained_assignment = None
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#  !pip install openpyxl 

You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m


In [None]:
# import hate crimes file
df = pd.read_excel('Hate Crimes Open Data_6.xlsx')
df.head()

Unnamed: 0,Date of Offense,Time of Offense,Date Offense Reported,Report Year,Month,CCN,District,Block Location,Type of Hate Bias,Targeted Group,Top Offense Type
0,2012-01-08,1500-1505,2012-01-08,2012,1,12003845,3D,1600 B/O 17th St NW,Sexual Orientation,,Threats
1,2012-01-12,1722,2012-01-12,2012,1,12005834,1D,3rd St SW & K St SW,Sexual Orientation,,Robbery
2,2012-01-13,1255-1258,2012-01-13,2012,1,12006285,4D,Park Rd NW & Sherman Ave NW,Race,Unspecified,Simple Assault
3,2012-01-14,0240-0250,2012-01-14,2012,1,12006716,3D,1800 B/O 14th St NW,Sexual Orientation,,Simple Assault
4,2012-01-14,0431-0433,2012-01-14,2012,1,12006742,3D,18th St NW & Florida Ave NW,Ethnicity/National Origin,Arab/Middle Eastern,Simple Assault


## Preprocessing

In [None]:
# one value in District = Unk
df = df.loc[df.District !='Unk']

# Some district values are '2D ' instead of '2D'
df.District[df.District=='2D '] = '2D'

# Remove the D in the districts since the geojsons only need the number
df['District'] = df['District'].str.replace('D', '') # remove D

# the column "Targeted Group " has an extra space, fix it
df = df.rename(columns={"Targeted Group ": "Targeted Group"}, errors="raise")

# Change Targeted Group NaNs to "Not Reported"
df['Targeted Group'] = df['Targeted Group'].fillna('Not Reported')

# some targeted group values say "Black " and "Black  "
df['Targeted Group'][df['Targeted Group']=='Black '] = 'Black'
df['Targeted Group'][df['Targeted Group']=='Black  '] = 'Black'

## Visualize the hate crimes per Police District

In [None]:
# load file
import json
with open('police-districts-mpd.geojson') as f:
    dc_pd = json.load(f)

### View overall hate crime counts per district

In [None]:
# make a dataframe to count up the crime types per district
district_count = df['District'].value_counts().to_frame().reset_index()
district_count = district_count.rename(columns={"index": "DISTRICT", "District": "Count"})

In [None]:
# view the graph
import plotly.express as px
fig = px.choropleth_mapbox(district_count, geojson=dc_pd, locations="DISTRICT", color='Count',
                           featureidkey="properties.DISTRICT",
                           color_continuous_scale="sunset",
                           range_color=(0, 350),
                           mapbox_style="carto-positron",
                           zoom=10, center = {"lat": 38.9072, "lon": -77.0369},
                           opacity=0.5,
                           labels={'Count':'Count'}
                          )
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

### View individual hate crime types by district

In [None]:
# one row per district and per crime count
# just do a couple for now, can do all in a loop later ;)
sexual_orientation_count = df['District'].loc[df['Type of Hate Bias']=='Sexual Orientation'].value_counts().to_frame().reset_index()
race_count = df['District'].loc[df['Type of Hate Bias']=='Race'].value_counts().to_frame().reset_index()

sexual_orientation_count = sexual_orientation_count.rename(columns={"index": "DISTRICT", "District": "Count"})
race_count = race_count.rename(columns={"index": "DISTRICT", "District": "Count"})

In [None]:
# sexual orientation
fig = px.choropleth_mapbox(sexual_orientation_count, geojson=dc_pd, locations="DISTRICT", color='Count',
                           featureidkey="properties.DISTRICT",
                           color_continuous_scale="sunset",
                           range_color=(0, 125),
                           mapbox_style="carto-positron",
                           zoom=10, center = {"lat": 38.9072, "lon": -77.0369},
                           opacity=0.5,
                           labels={'Count':'Count'}
                          )
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In [None]:
# race
fig = px.choropleth_mapbox(race_count, geojson=dc_pd, locations="DISTRICT", color='Count',
                           featureidkey="properties.DISTRICT",
                           color_continuous_scale="sunset",
                           range_color=(0, 90),
                           mapbox_style="carto-positron",
                           zoom=10, center = {"lat": 38.9072, "lon": -77.0369},
                           opacity=0.5,
                           labels={'Count':'Count'}
                          )
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

# Converting the given address to a coordinate
Referencing [these steps](https://towardsdatascience.com/geocode-with-python-161ec1e62b89)

In [None]:
# !pip install geopandas --quiet
# !pip install geopy --quiet
import geopandas
import geopy
from geopy.geocoders import Nominatim

You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m


## Preprocess the addresses to avoid errors

### Fix addresses that are malformed.
#### 50 addresses are directly in intersections (containing '&' between them), thus their coordinates cannot be retrieved by Nominatim. In this case we will manually input the latitude and longitude by looking it up on google maps.

In [None]:
## Fix malformed addresses ##
# Remove B/O and put into new column Address
df['Address'] = df['Block Location'].str.replace("B/O", '')

# Append with Washington DC
df['Address'] = df['Address'].astype(str) + ' Washington DC'

# remove 'blk of' and variations
df['Address'] = df['Address'].str.replace('blk of', '')
df['Address'] = df['Address'].str.replace('Blk of', '')
df['Address'] = df['Address'].str.replace('BLK of', '')
df['Address'] = df['Address'].str.replace('blk', '')
df['Address'] = df['Address'].str.replace('Blk', '')
df['Address'] = df['Address'].str.replace('BLK', '')

# remove apartments
df['Address'] = df['Address'].str.replace('Apt B5', '')

# fix ones with a strange Unit
df.Address[df.Address=='Unit  O St NW Washington DC'] = 'O St NW Washington DC'
df.Address[df.Address=='Unit  M St SW Washington DC'] = 'M St SW Washington DC'
df.Address[df.Address=='Unit  M St SE Washington DC'] = 'M St SE Washington DC'
df.Address[df.Address=='Unit  Florida Ave  NE Washington DC'] = 'Florida Ave NE Washington DC'

# now remove Unit (can't we just do this and not the above one..???)#####
df['Address'] = df['Address'].str.replace('Unit', '')

# fix mispellings and abbreviations
df['Address'] = df['Address'].str.replace('Tinidad', 'Trinidad')
df['Address'] = df['Address'].str.replace('U Stre NW', 'U Street NW')
df['Address'] = df['Address'].str.replace('Conn Ave', 'Connecticut Ave')
df['Address'] = df['Address'].str.replace('Pennslylvania Abe', 'Pennsylvania Ave')
df['Address'] = df['Address'].str.replace('Martin Luther King Ave SE', 'Martin Luther King Jr Ave SE')
df['Address'] = df['Address'].str.replace('New Jersey Ave NE', '400  New Jersey Ave NW')
df['Address'] = df['Address'].str.replace('RockCreek', 'Rock Creek')
df['Address'] = df['Address'].str.replace('Summer Rd SE', 'Sumner Rd SE')
df['Address'] = df['Address'].str.replace('Owens Pl NE', 'Owen Pl NE')

# fix other broken ones
df.Address[df.Address=='3300   Martin Luther King Jr Ave Washington DC'] = '3300 Martin Luther King Jr Ave SE Washington DC'

Keep the broken (intersections and one completely broken address), and working addresses in separate dataframes, to be joined later. The manually input latitude and longitude points are in the csv 'broken_coordinates'

In [None]:
# keep the intersections in a separate df for now
# make a list of intersections and broken (51 total)
broken_addresses = df[(df['Address'].str.contains("&")) | (df['Address'] == '3300  Water St NW Washington DC')]
broken_list = broken_addresses['Address'].tolist()
working_addresses = df[~df['Address'].isin(broken_list)]

### Update all addresses now to have coordinates, aside from the intersections

In [None]:
## Input gcode 
geolocator = Nominatim(user_agent="your_app_name")
working_addresses['gcode'] = working_addresses.Address.apply(geolocator.geocode)

In [None]:
# retrieve lat long
working_addresses['lat'] = [g.latitude for g in working_addresses.gcode]
working_addresses['long'] = [g.longitude for g in working_addresses.gcode]

In [None]:
# upload broken addresses
broken_coordinates = pd.read_excel('broke_coordinates.xlsx')  

### Merge the 2 dataframes now to have the master one with all coordinates!

In [None]:
# concat
final_df = pd.concat([working_addresses,broken_coordinates],ignore_index=True)
final_df

Unnamed: 0,Date of Offense,Time of Offense,Date Offense Reported,Report Year,Month,CCN,District,Block Location,Type of Hate Bias,Targeted Group,Top Offense Type,Address,gcode,lat,long
0,2012-01-08 00:00:00,1500-1505,2012-01-08 00:00:00,2012,1,12003845,3,1600 B/O 17th St NW,Sexual Orientation,Not Reported,Threats,1600 17th St NW Washington DC,"(1600, 17th Street Northwest, Dupont Circle, W...",38.911250,-77.038588
1,2012-01-14 00:00:00,0240-0250,2012-01-14 00:00:00,2012,1,12006716,3,1800 B/O 14th St NW,Sexual Orientation,Not Reported,Simple Assault,1800 14th St NW Washington DC,"(1800, 14th Street Northwest, Cardozo/Shaw, Wa...",38.914217,-77.032054
2,2012-01-22 00:00:00,145,2012-01-22 00:00:00,2012,1,12010626,7,1300 B/O Alabama Ave SE,Sexual Orientation,Not Reported,ADW,1300 Alabama Ave SE Washington DC,"(1300, Alabama Avenue Southeast, Washington, D...",38.845056,-76.987412
3,2012-01-22 00:00:00,1930,2012-01-22 00:00:00,2012,1,12010912,1,600 B/O H St NW,Race,Asian,Threats,600 H St NW Washington DC,"(600, H Street Northwest, Chinatown, Washingto...",38.899729,-77.020065
4,2012-02-06 00:00:00,1900,2012-02-06 00:00:00,2012,2,12018396,3,1300 B/O Park Rd NW,Sexual Orientation,Not Reported,Simple Assault,1300 Park Rd NW Washington DC,"(1300, Park Road Northwest, Columbia Heights, ...",38.931198,-77.029808
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1250,2015-10-17,0305-0310,2015-10-17,2015,10,15164932,3,Vermont Ave NW & U St NW,Race,Black,Simple Assault,Vermont Ave NW & U St NW Washington DC,,38.916986,-77.025351
1251,2015-10-23,1640-1651,2015-10-23,2015,10,15168831,1,13th St NW & G ST NW,Sexual Orientation,Not Reported,Simple Assault,13th St NW & G ST NW Washington DC,,38.898310,-77.029614
1252,2015-11-05,1820-1828,2015-11-05,2015,11,15176719,1,7th St NW & Indiana Ave NW,Ethnicity/National Origin,Arab/Middle Eastern,Simple Assault,7th St NW & Indiana Ave NW Washington DC,,38.893892,-77.021878
1253,2017-07-07,2035,2017-07-07,2017,7,17116470,2,Central Ave SE & Southern Ave SE,Sexual Orientation,Not Reported,ADW,Central Ave SE & Southern Ave SE Washington DC,,38.885720,-76.918578


# Extract zipcodes from the coordinates

#### Load the final df, and extract the zip codes from reverse engineering the location from the zip code. This may seem like a strange step since we just extracted the coordinates from the street address earlier - but it was not possible to extract a zipcode from just a street address earlier.

In [None]:
# load and preprocess
final_df = pd.read_csv('final_hate_crimes.csv')
final_df = final_df.drop(columns=['Unnamed: 0'])
final_df['Date of Offense'] = pd.to_datetime(final_df['Date of Offense'])
final_df['Date Offense Reported'] = pd.to_datetime(final_df['Date Offense Reported'])
final_df

Unnamed: 0,Date of Offense,Time of Offense,Date Offense Reported,Report Year,Month,CCN,District,Block Location,Type of Hate Bias,Targeted Group,Top Offense Type,Address,gcode,lat,long
0,2012-01-08,1500-1505,2012-01-08,2012,1,12003845,3,1600 B/O 17th St NW,Sexual Orientation,Not Reported,Threats,1600 17th St NW Washington DC,"1600, 17th Street Northwest, Dupont Circle, Wa...",38.911250,-77.038588
1,2012-01-14,0240-0250,2012-01-14,2012,1,12006716,3,1800 B/O 14th St NW,Sexual Orientation,Not Reported,Simple Assault,1800 14th St NW Washington DC,"1800, 14th Street Northwest, Cardozo/Shaw, Was...",38.914217,-77.032054
2,2012-01-22,145,2012-01-22,2012,1,12010626,7,1300 B/O Alabama Ave SE,Sexual Orientation,Not Reported,ADW,1300 Alabama Ave SE Washington DC,"1300, Alabama Avenue Southeast, Washington, Di...",38.845056,-76.987412
3,2012-01-22,1930,2012-01-22,2012,1,12010912,1,600 B/O H St NW,Race,Asian,Threats,600 H St NW Washington DC,"600, H Street Northwest, Chinatown, Washington...",38.899729,-77.020065
4,2012-02-06,1900,2012-02-06,2012,2,12018396,3,1300 B/O Park Rd NW,Sexual Orientation,Not Reported,Simple Assault,1300 Park Rd NW Washington DC,"1300, Park Road Northwest, Columbia Heights, W...",38.931198,-77.029808
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1250,2015-10-17,0305-0310,2015-10-17,2015,10,15164932,3,Vermont Ave NW & U St NW,Race,Black,Simple Assault,Vermont Ave NW & U St NW Washington DC,,38.916986,-77.025351
1251,2015-10-23,1640-1651,2015-10-23,2015,10,15168831,1,13th St NW & G ST NW,Sexual Orientation,Not Reported,Simple Assault,13th St NW & G ST NW Washington DC,,38.898310,-77.029614
1252,2015-11-05,1820-1828,2015-11-05,2015,11,15176719,1,7th St NW & Indiana Ave NW,Ethnicity/National Origin,Arab/Middle Eastern,Simple Assault,7th St NW & Indiana Ave NW Washington DC,,38.893892,-77.021878
1253,2017-07-07,2035,2017-07-07,2017,7,17116470,2,Central Ave SE & Southern Ave SE,Sexual Orientation,Not Reported,ADW,Central Ave SE & Southern Ave SE Washington DC,,38.885720,-76.918578


In [None]:
import geopy

# FUNCTION TO EXTRACT ZIP CODE
def get_zipcode(df, geolocator, lat_field, lon_field):
    try:
        location = geolocator.reverse((df[lat_field], df[lon_field]))
        return location.raw['address']['postcode']
    except (AttributeError, KeyError, ValueError):
        print(df[lat_field], df[lon_field])
        return None


# longlat = pd.read_csv('longlat.csv', sep='\t')
geolocator = geopy.Nominatim(user_agent='hii') #My OpenMap username

# apply
final_df['zipcodes'] = final_df.apply(
    get_zipcode, axis=1, geolocator=geolocator, 
    lat_field='lat', lon_field='long')

38.907363 -77.029726


## Correct the zipcodes that were not appropriately formed
There are instances of '2005' instead of 20005, etc. There is one that is nan.

In [None]:
# load in-progress df
final_df = pd.read_csv('hate_crimes-zipcode_cleaning_inprogress.csv')
final_df = final_df.drop(columns=['Unnamed: 0'])

In [None]:
## find broken ones ##
# nan
final_df[final_df['zipcodes'].isna()]

# 28 say 2005 which is incorrect - we must check what's in the gcode value, which contains the zip code, and then fix the zip field
final_df[['gcode','zipcodes']].loc[final_df['zipcodes']=='2005']

Unnamed: 0,gcode,zipcodes
0,"1600, 17th Street Northwest, Dupont Circle, Wa...",2005
80,"1300, 14th Street Northwest, Logan Circle/Shaw...",2005
144,"1300, Corcoran Street Northwest, Logan Circle/...",2005
158,"1600, 17th Street Northwest, Dupont Circle, Wa...",2005
161,"1900, 14th Street Northwest, Cardozo/Shaw, Was...",2005
267,"14th Street Post Office, 2000, 14th Street Nor...",2005
283,"1600, 17th Street Northwest, Dupont Circle, Wa...",2005
294,"Logan Circle Laundry, 1100, Rhode Island Avenu...",2005
368,"1300, R Street Northwest, Dupont Circle, Washi...",2005
480,"Rhode Island Avenue Northwest, Logan Circle/Sh...",2005


#### Clean the broken ones.

In [None]:
## fix zipcodes ##

# the single nan should be 20005
final_df['zipcodes'] = final_df['zipcodes'].fillna(20005)

# 28 say 2005 which is incorrect - we must check what's in the gcode value, which contains the zip code, and then fix the zip field
final_df.loc[0,'zipcodes'] = 20009
final_df.loc[80,'zipcodes'] = 20005
final_df.loc[144,'zipcodes'] = 20009
final_df.loc[158,'zipcodes'] = 20009
final_df.loc[161,'zipcodes'] = 20009
final_df.loc[267,'zipcodes'] = 20009
final_df.loc[283,'zipcodes'] = 20009
final_df.loc[294,'zipcodes'] = 20005
final_df.loc[368,'zipcodes'] = 20009
final_df.loc[480,'zipcodes'] = 20005
final_df.loc[485,'zipcodes'] = 20005
final_df.loc[497,'zipcodes'] = 20005
final_df.loc[512,'zipcodes'] = 20005
final_df.loc[519,'zipcodes'] = 20005
final_df.loc[572,'zipcodes'] = 20005
final_df.loc[755,'zipcodes'] = 20009
final_df.loc[836,'zipcodes'] = 20005
final_df.loc[995,'zipcodes'] = 20005
final_df.loc[1017,'zipcodes'] = 20005
final_df.loc[1040,'zipcodes'] = 20005
final_df.loc[1073,'zipcodes'] = 20005
final_df.loc[1097,'zipcodes'] = 20009
final_df.loc[1176,'zipcodes'] = 20009
final_df.loc[585,'zipcodes'] = 20009
final_df.loc[617,'zipcodes'] = 20009
final_df.loc[633,'zipcodes'] = 20005
final_df.loc[646,'zipcodes'] = 20005
final_df.loc[743,'zipcodes'] = 20005

In [None]:
## identify string issues
# some have dashes or colons - need to fix
# change to string first so we can use string functions
final_df['zipcodes'] = final_df['zipcodes'].astype('string')
final_df[['gcode','zipcodes']].loc[final_df['zipcodes'].str.contains(':')]
final_df[['gcode','zipcodes']].loc[final_df['zipcodes'].str.contains('-')]
# df.loc[df['Name'].str.contains("pokemon", case=False)]

Unnamed: 0,gcode,zipcodes
7,"1500, M Street Northwest, Golden Triangle, Was...",20005-4111
27,"1600, R Street Northwest, Dupont Circle, Washi...",20009-5540
35,"2000, 14th Street Southeast, Anacostia, Washin...",20020-4706
45,"1300, 23rd Street Northwest, West End, Dupont ...",20036-5305
88,"Massachusetts Avenue Northwest, Dupont Circle,...",20036-5305
129,"1100, Sumner Road Southeast, Barry Farm Dwelli...",20373-5815
142,"1200, 25th Street Northwest, West End, Washing...",20036-5305
189,"1200, Pleasant Street Southeast, Anacostia, Wa...",20020-4706
190,"K Street NW (access road), Golden Triangle, Wa...",20006-5346
194,"1200, 20th Street Northwest, Golden Triangle, ...",20036-5305


In [None]:
# fix dashes
final_df.loc[174,'zipcodes'] = '20016'
final_df.loc[205,'zipcodes'] = '20009'
final_df.loc[286,'zipcodes'] = '20001'
final_df.loc[298,'zipcodes'] = '20008'
final_df.loc[318,'zipcodes'] = '20001'
final_df.loc[330,'zipcodes'] = '20016'
final_df.loc[412,'zipcodes'] = '20001'
final_df.loc[600,'zipcodes'] = '20007'
final_df.loc[806,'zipcodes'] = '20001'
final_df.loc[848,'zipcodes'] = '20001'
final_df.loc[870,'zipcodes'] = '20018'
final_df.loc[900,'zipcodes'] = '20016'
final_df.loc[955,'zipcodes'] = '20016'
final_df.loc[967,'zipcodes'] = '20001'
final_df.loc[1146,'zipcodes'] = '20007'
final_df.loc[1177,'zipcodes'] = '20008'

# fix colons
final_df.loc[7,'zipcodes'] = '20005'
final_df.loc[27,'zipcodes'] = '20009'
final_df.loc[35,'zipcodes'] = '20020'
final_df.loc[45,'zipcodes'] = '20037'
final_df.loc[88,'zipcodes'] = '20036'
final_df.loc[129,'zipcodes'] = '20020'
final_df.loc[142,'zipcodes'] = '20037'
final_df.loc[189,'zipcodes'] = '20020'
final_df.loc[190,'zipcodes'] = '20006'
final_df.loc[194,'zipcodes'] = '20036'
final_df.loc[242,'zipcodes'] = '20009'
final_df.loc[243,'zipcodes'] = '20009'
final_df.loc[274,'zipcodes'] = '20036'
final_df.loc[278,'zipcodes'] = '20009'
final_df.loc[284,'zipcodes'] = '20009'
final_df.loc[301,'zipcodes'] = '20036'
final_df.loc[332,'zipcodes'] = '20036'
final_df.loc[364,'zipcodes'] = '20009'
final_df.loc[396,'zipcodes'] = '20009'
final_df.loc[408,'zipcodes'] = '20009'
final_df.loc[437,'zipcodes'] = '20005'
final_df.loc[450,'zipcodes'] = '20036'
final_df.loc[487,'zipcodes'] = '20009'
final_df.loc[491,'zipcodes'] = '20007'
final_df.loc[564,'zipcodes'] = '20005'
final_df.loc[571,'zipcodes'] = '20036'
final_df.loc[580,'zipcodes'] = '20009'
final_df.loc[625,'zipcodes'] = '20036'
final_df.loc[634,'zipcodes'] = '20020'
final_df.loc[654,'zipcodes'] = '20009'
final_df.loc[709,'zipcodes'] = '20006'
final_df.loc[756,'zipcodes'] = '20006'
final_df.loc[758,'zipcodes'] = '20006'
final_df.loc[760,'zipcodes'] = '20036'
final_df.loc[768,'zipcodes'] = '20006'
final_df.loc[796,'zipcodes'] = '20006'
final_df.loc[805,'zipcodes'] = '20006'
final_df.loc[875,'zipcodes'] = '20037'
final_df.loc[876,'zipcodes'] = '20005'
final_df.loc[919,'zipcodes'] = '20037'
final_df.loc[942,'zipcodes'] = '20036'
final_df.loc[953,'zipcodes'] = '20036'
final_df.loc[999,'zipcodes'] = '20009'
final_df.loc[1035,'zipcodes'] = '20036'
final_df.loc[1047,'zipcodes'] = '20005'
final_df.loc[1054,'zipcodes'] = '20020'
final_df.loc[1094,'zipcodes'] = '20037'
final_df.loc[1144,'zipcodes'] = '20036'
final_df.loc[1192,'zipcodes'] = '20036'
final_df.loc[1193,'zipcodes'] = '20036'
final_df.loc[1209,'zipcodes'] = '20037'
final_df.loc[1241,'zipcodes'] = '20006'
final_df.loc[1242,'zipcodes'] = '20009'

## We save the file as 'hate_crimes-zipcode_cleaning_DONE.csv' to reference in our other notebooks.

In [None]:
# save the file 
final_df.to_csv('hate_crimes-zipcode_cleaning_DONE.csv')

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=70e3ba08-b650-4b83-b42c-d2853377667c' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>