# Demo W39Wed:  Pandas

### 1. Series

**Creating series from dictionaries.** First, let's important the `pandas` module, define a dictionary, and see how we can turn it into a series.

In [1]:
import pandas as pd

In [2]:
pop_dict = {"Hovedstaden": 1.84,
            "Sjælland": 0.84,
            "Syddanmark": 1.22,
            "Midtjylland": 1.32,
            "Nordjylland": 0.59}

pop_series = pd.Series(pop_dict)

Let's look at the dictionary. It contains the by now familiary keys and values.

In [3]:
pop_dict

{'Hovedstaden': 1.84,
 'Sjælland': 0.84,
 'Syddanmark': 1.22,
 'Midtjylland': 1.32,
 'Nordjylland': 0.59}

Now, let's look at the same data as a `pandas` series. This is much easier to read, since it comes in a tabular format with one column for indices and one column for values.

In [4]:
pop_series

Hovedstaden    1.84
Sjælland       0.84
Syddanmark     1.22
Midtjylland    1.32
Nordjylland    0.59
dtype: float64

**Accessing data in a series.** To draw specific content from a `pandas` series, we can use the string indices like in a dictionary.

In [57]:
pop_series['Midtjylland']

1.32

But, we can also use numeric indices like in lists since series are ordered. Using a single numeric index, returns the value stored in the series at the specified position.

In [58]:
pop_series[0]

1.84

If we instead specify a range of indices, i.e. slice the data, we get a subset of the series including the character indices, their associated values, and an indicator of the series type.

In [59]:
pop_series[0:3]

Hovedstaden    1.84
Sjælland       0.84
Syddanmark     1.22
dtype: float64

**Question.** What is the type of the value returned if we specify one numeric index in the series? What is the type if we slice the list?

As with lists, we can also slice the series at select indices instead of using a range of indices. To do so, we need to index the series using `[[`; where the innermost `[` define a list of indices and the outermost `[` index the series. The cell below returns the first and last entry in the series, first by specifying the numeric index then by specifying the character index.

In [60]:
print(pop_series[[0,-1]])
print(pop_series[['Hovedstaden','Syddanmark']])

Hovedstaden    1.84
Nordjylland    0.59
dtype: float64
Hovedstaden    1.84
Syddanmark     1.22
dtype: float64


**Creating series from other data structures.** Now, let's see how we can create series from other data structures than dictionaries, starting with lists.

In [61]:
region_names = ["Hovedstaden",
                "Sjælland",
                "Syddanmark",
                "Midtjylland",
               "Nordjylland"]

pop = [1.85,
       0.83,
       1.22,
       1.33,
       0.59]

To create a series from two lists containing values and character indices, we can pass the values as the first argument to the `Series` command and specify the argument `index` as equal to the list of indices.

In [62]:
pop_series_from_lists = pd.Series(pop, index = region_names)

pop_series_from_lists

Hovedstaden    1.85
Sjælland       0.83
Syddanmark     1.22
Midtjylland    1.33
Nordjylland    0.59
dtype: float64

If we forget to specify the series indices when we create it, we can specify them after the fact by assigning the list of indices to the `index` attribute.

In [63]:
pop_series_from_lists = pd.Series(pop)

pop_series_from_lists

0    1.85
1    0.83
2    1.22
3    1.33
4    0.59
dtype: float64

In [64]:
pop_series_from_lists.index = region_names

pop_series_from_lists

Hovedstaden    1.85
Sjælland       0.83
Syddanmark     1.22
Midtjylland    1.33
Nordjylland    0.59
dtype: float64

**Numerical operations with series.** `Pandas` series can contain any type of data and have different commands applicable to each type of data. For numeric data you might be interested in descriptive statistics which we will cover in the next session. For all types of data, you can count the number of data points in your series using the `count` command. Note that this counts only non-missing data points.

In [65]:
pop_series_from_lists.count()

5

In [66]:
len(pop_series_from_lists)

5

### 2. DataFrames

**Creating DataFrames from dictionaries.** In most applied cases, you will be interested to have more than one value associated with each observation. To accommodate this, `pandas` introduces `DataFrame` containers which we can also construct from dictionaries.

Below, we first define two additional lists containing values for our regions, then we construct a dictionary from the four lists and keys we assign to them. Finally, we turn this complex dictionary containing four keys and values consisting of lists into a neat `DataFrame`.

In [67]:
covid_cases = [166730, 212168, 262330, 353572, 326801]
covid_vaccinations = [1.34, 0.63, 0.92, 1.00, 0.45]

complex_dictionary = {"Name": region_names,
                      "Population (Millions)": pop,
                      "Covid Cases": covid_cases,
                      "Vaccinations (Millions)": covid_vaccinations}

df_from_dict = pd.DataFrame(complex_dictionary)

What does the dictionary look like?

In [68]:
complex_dictionary

{'Name': ['Hovedstaden',
  'Sjælland',
  'Syddanmark',
  'Midtjylland',
  'Nordjylland'],
 'Population (Millions)': [1.85, 0.83, 1.22, 1.33, 0.59],
 'Covid Cases': [166730, 212168, 262330, 353572, 326801],
 'Vaccinations (Millions)': [1.34, 0.63, 0.92, 1.0, 0.45]}

And what does the same data look like in a `DataFrame` container?

In [69]:
df_from_dict

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
0,Hovedstaden,1.85,166730,1.34
1,Sjælland,0.83,212168,0.63
2,Syddanmark,1.22,262330,0.92
3,Midtjylland,1.33,353572,1.0
4,Nordjylland,0.59,326801,0.45


**Creating DataFrames from other data structures.** First, a `DataFrame` out of lists of tuples.

In [70]:
list_of_tuples = [("Hovedstaden", 1.84, 166730, 1.34),
                  ("Sjælland", 0.84, 212168, 0.63),
                  ("Syddanmark", 1.22, 262330, 0.92),
                  ("Midtjylland", 1.32, 353572, 1.00),
                  ("Nordjylland", 0.59, 326801,  0.45)]
df_from_list_of_tuples = pd.DataFrame(list_of_tuples)


This also creates a neat `DataFrame` but does not give us the column headings or variable names.

In [71]:
df_from_list_of_tuples

Unnamed: 0,0,1,2,3
0,Hovedstaden,1.84,166730,1.34
1,Sjælland,0.84,212168,0.63
2,Syddanmark,1.22,262330,0.92
3,Midtjylland,1.32,353572,1.0
4,Nordjylland,0.59,326801,0.45


**Naming Columns**: We can add names to the columns using the `.columns` property. 

In [72]:
df_from_list_of_tuples.columns = ["Name", "Population (Millions)", "Covid Cases", "Vaccinations (Millions)"]

In [73]:
df_from_list_of_tuples

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
0,Hovedstaden,1.84,166730,1.34
1,Sjælland,0.84,212168,0.63
2,Syddanmark,1.22,262330,0.92
3,Midtjylland,1.32,353572,1.0
4,Nordjylland,0.59,326801,0.45


**Accessing data in a DataFrame.** Like a `pandas` `Series`, we can access the data in a `DataFrame` using string and numeric indices.

To access the `Name` column, we can index it by it's string name.

In [74]:
df_from_dict['Name']

0    Hovedstaden
1       Sjælland
2     Syddanmark
3    Midtjylland
4    Nordjylland
Name: Name, dtype: object

Or access it from the `DataFrame` attribute `Name`.

In [75]:
df_from_dict.Name

0    Hovedstaden
1       Sjælland
2     Syddanmark
3    Midtjylland
4    Nordjylland
Name: Name, dtype: object

**Question.** What happens if we try to access the column "Vaccinations (Millions)" as an attribute of our `DataFrame`?

If we don't know the name of the column, we want to index, we can look up all names using the `columns` attribute.

In [76]:
df_from_dict.columns

Index(['Name', 'Population (Millions)', 'Covid Cases',
       'Vaccinations (Millions)'],
      dtype='object')

Using numeric indices, allows us to access the rows in the `DataFrame` and functions analogous to indexing lists, i.e. `[StartIndex:EndIndex:StepLength]`

In [77]:
df_from_dict[0::2]

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
0,Hovedstaden,1.85,166730,1.34
2,Syddanmark,1.22,262330,0.92
4,Nordjylland,0.59,326801,0.45


We can also access the same rows by passing a list of booleans (True/False) to the indexing brackets. NB that there are two sets of `[`, one for the indexing the other to define the list of booleans.

In [78]:
df_from_dict[[True, False, True, False, False]]

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
0,Hovedstaden,1.85,166730,1.34
2,Syddanmark,1.22,262330,0.92


As with `Series`, we can add a character `index` to our `DataFrame` that we can then use to index the values in specific cells. 

In [79]:
region_initials = ["HV", "SJ", "SD", "MD","ND"]

df_from_dict.index = region_initials

df_from_dict

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
HV,Hovedstaden,1.85,166730,1.34
SJ,Sjælland,0.83,212168,0.63
SD,Syddanmark,1.22,262330,0.92
MD,Midtjylland,1.33,353572,1.0
ND,Nordjylland,0.59,326801,0.45


In [80]:
df_from_dict["Population (Millions)"]["MD"]

1.33

You can also use the `loc` method to access particular rows and columns based on their label. `loc` and plain indexing are very similar, but `loc` can additionally select a single row or list of rows based on labels. 

To access a single row or set of rows:

In [81]:
df_from_dict.loc["SJ"]

Name                       Sjælland
Population (Millions)          0.83
Covid Cases                  212168
Vaccinations (Millions)        0.63
Name: SJ, dtype: object

In [82]:
df_from_dict.loc[["HV","MD"]]

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
HV,Hovedstaden,1.85,166730,1.34
MD,Midtjylland,1.33,353572,1.0


Beyond accessing rows through their position, we can also subset the `DataFrame` according to specific conditions. So, if we only wanted to look at regions that have 1 M vaccinations or more.

In [83]:
df_from_dict[df_from_dict["Vaccinations (Millions)"] >= 1.0]

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
HV,Hovedstaden,1.85,166730,1.34
MD,Midtjylland,1.33,353572,1.0


**Manipulating existing DataFrames.** So far, we have only asked Python to show us different slices of our data but we can also manipulate the underlying data such as subset it permanently or add rows or columns.

In addition to subsetting by inclusion, i.e. saying we want to look at specific columns we can also subset by exclusion using the `drop` command and specifying the `axis` argument as `1` to subset columns and `0` to subset rows.

In [84]:
df_from_dict.drop(["Covid Cases"], axis = 1)

Unnamed: 0,Name,Population (Millions),Vaccinations (Millions)
HV,Hovedstaden,1.85,1.34
SJ,Sjælland,0.83,0.63
SD,Syddanmark,1.22,0.92
MD,Midtjylland,1.33,1.0
ND,Nordjylland,0.59,0.45


In [85]:
df_from_dict.drop(["SD"], axis = 0)

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
HV,Hovedstaden,1.85,166730,1.34
SJ,Sjælland,0.83,212168,0.63
MD,Midtjylland,1.33,353572,1.0
ND,Nordjylland,0.59,326801,0.45


The cell below creates a subset of our original data only containing the regions that have vaccinated over 1M people. 

In [86]:
df_from_dict_highVaccination = df_from_dict[df_from_dict["Vaccinations (Millions)"] >= 1.0]

df_from_dict_highVaccination

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
HV,Hovedstaden,1.85,166730,1.34
MD,Midtjylland,1.33,353572,1.0


To add a row to our exisiting `DateFrame`, we first need to define the new row, turn it into a `DataFrame` object and then use the append command. NB we are setting the `ignore_index` argument to `True` here which removes our string row indices and also provides us with consistent row indices. 

In [87]:
new_row = [{"Name": "Ontario",
            "Population (Millions)": 14.57,
            "Covid Cases": 582635,
            "Vaccinations (Millions)": 10.46}]

new_row_df = pd.DataFrame(new_row)

df_from_dict_longer = df_from_dict.append(new_row_df, ignore_index = True)

df_from_dict_longer

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
0,Hovedstaden,1.85,166730,1.34
1,Sjælland,0.83,212168,0.63
2,Syddanmark,1.22,262330,0.92
3,Midtjylland,1.33,353572,1.0
4,Nordjylland,0.59,326801,0.45
5,Ontario,14.57,582635,10.46


If we did not set the `ignore_index` argument to `True`, we would either end up with inconsistent row indices

In [88]:
df_from_dict_inconsistent = df_from_dict.append(new_row_df)

df_from_dict_inconsistent

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
HV,Hovedstaden,1.85,166730,1.34
SJ,Sjælland,0.83,212168,0.63
SD,Syddanmark,1.22,262330,0.92
MD,Midtjylland,1.33,353572,1.0
ND,Nordjylland,0.59,326801,0.45
0,Ontario,14.57,582635,10.46


Or we would end up with duplicate indices, which can lead to unexpected indexing behavior.

If you are changing values it is also safer to use `loc` because strait indexing tends to return a copy and so you may not be modifying the original data frame. 

In [89]:
df_from_dict.loc["SJ","Name"] = "Zealand"

In [90]:
df_from_dict

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions)
HV,Hovedstaden,1.85,166730,1.34
SJ,Zealand,0.83,212168,0.63
SD,Syddanmark,1.22,262330,0.92
MD,Midtjylland,1.33,353572,1.0
ND,Nordjylland,0.59,326801,0.45


We can also add a new column to our `DataFrame` based on some of the other data. Adding a column is similar to adding a new key/value in a dictionary. You can just assign the new values to the column.

In [91]:
df_from_dict['Proportion Vaccinated'] = df_from_dict["Vaccinations (Millions)"] / df_from_dict["Population (Millions)"]

In [92]:
df_from_dict

Unnamed: 0,Name,Population (Millions),Covid Cases,Vaccinations (Millions),Proportion Vaccinated
HV,Hovedstaden,1.85,166730,1.34,0.724324
SJ,Zealand,0.83,212168,0.63,0.759036
SD,Syddanmark,1.22,262330,0.92,0.754098
MD,Midtjylland,1.33,353572,1.0,0.75188
ND,Nordjylland,0.59,326801,0.45,0.762712


Finally, maybe we want to update the "Proportion Vaccinated" column so that it says "High Vaccination" in all regions with more than a 75% vaccination rate. With `.loc` you can also specify the column to update. In general, though, we probably don't want to do this because now the column contains two different types, and "highly vaccinated" doesn't really say anything about the proportion. 

In [94]:
df_from_dict.loc[df_from_dict["Proportion Vaccinated"] > 0.75, ["Proportion Vaccinated"]] = "Highly Vaccinated"

df_from_dict

TypeError: '>' not supported between instances of 'str' and 'float'

**Merging DataFrames** Another thing you may want to do is merge tables. 

Consider the two pandas dataframes `df1` and `df2` below. 

In [43]:
#Importing numpy to generate random numbers
from numpy import random as rand 

df1 = pd.DataFrame(
    rand.randint(0, 10, size=(3, 4)),
    index=[0, 1, 2], columns=['a', 'b', 'c', 'd']
)

df2 = pd.DataFrame(
    rand.randint(0, 10, size=(5, 3)),
    index=[1, 2, 3, 4, 5], columns=['c', 'd', 'e']
)

Join two tables using `pd.concat`. You can select which way to join (horizontal or vertical) by specifying the axis. `axis = 0` joins vertically and `axis = 1` joins horizontally. 

You can also do an "inner" join that is the intersection of the frames, or an "outer" join for the union.

In [44]:
pd.concat([df1,df2], axis = 0, join = "inner")

Unnamed: 0,c,d
0,4,9
1,0,6
2,5,0
1,9,5
2,2,1
3,1,6
4,7,1
5,1,0


Note the horizontal inner join keeps only the columns in common, but keeps all the rows. 

Note also the overlapping indices. It is better to re-index the tables so that they are unique. 

In [45]:
pd.concat([df1,df2], axis = 0, join = "inner", ignore_index = True)

Unnamed: 0,c,d
0,4,9
1,0,6
2,5,0
3,9,5
4,2,1
5,1,6
6,7,1
7,1,0


If we want to keep all of the columns (vertical join) and take the overlapping rows, we want to use `axis = 1`. Note that `pd.concat` looks at overlapping indices and keeps rows with the same index and discards all others for an inner join. 

In [46]:
pd.concat([df1,df2], axis = 1, join = "inner")

Unnamed: 0,a,b,c,d,c.1,d.1,e
1,9,8,0,6,9,5,9
2,0,1,5,0,2,1,8


On the other hand, if we don't want to discard rows, but want to add the tables together, we want an outer join. 

In [47]:
pd.concat([df1,df2], axis = 0, join = "outer", sort = True, ignore_index = True)

Unnamed: 0,a,b,c,d,e
0,4.0,8.0,4,9,
1,9.0,8.0,0,6,
2,0.0,1.0,5,0,
3,,,9,5,9.0
4,,,2,1,8.0
5,,,1,6,4.0
6,,,7,1,1.0
7,,,1,0,5.0


In [48]:
pd.concat([df1,df2], axis = 1, join = "outer")

Unnamed: 0,a,b,c,d,c.1,d.1,e
0,4.0,8.0,4.0,9.0,,,
1,9.0,8.0,0.0,6.0,9.0,5.0,9.0
2,0.0,1.0,5.0,0.0,2.0,1.0,8.0
3,,,,,1.0,6.0,4.0
4,,,,,7.0,1.0,1.0
5,,,,,1.0,0.0,5.0


Note the difference between a horizontal and vertical outer join. 

In a vertical outer join, all of the rows are kept, with overlapping columns merged where possible and null values filled in for the missing elements. 

In a horizontal outer join, all of the columns are kept and rows indices are matched where possible and null values filled in for the missing elements. 

## 3. Importing data with Pandas 

As we've seen in previous sessions, `Pandas` also has built-in functionality to import all kinds of tabular data which we can then manipulate with our knowledge about how to work with `DataFrames`. 

In [49]:
url = 'https://dl.dropboxusercontent.com/s/9war4suj1s5j1ah/sodas_people_twitter_scholar.csv?dl=1'

sodas_people_df = pd.read_csv(url)

sodas_people_df

Unnamed: 0,description,role,twitter,google_scholar,mail,name,twitter_handle,twitter_created_at,twitter_bio,twitter_followers_n,...,twitter_verified,gs_cites_all_time,gs_cites_since_2015,gs_h_index_all_time,gs_i10_index_all_time,gs_most_cited_title,gs_most_cited_authors,gs_most_cited_outlet,gs_most_cited_year,gs_most_cited_cites
0,David Dreyer Lassen is the Director of SODAS a...,SODAS steering committee,https://twitter.com/daviddlassen,https://scholar.google.dk/citations?user=aRBQc...,david.dreyer.lassen@econ.ku.dk,David Dreyer Lassen,daviddlassen,2013-11-23 11:36:43,"Chairman, Independent Research Fund Denmark (D...",1138.0,...,False,3991.0,2170.0,21.0,32.0,"Fiscal transparency, political parties, and de...","JE Alt, DD Lassen","European Economic Review 50 (6), 1403-1439, 2006",2006.0,695.0
1,Morten Axel Pedersen is Deputy Director of SOD...,SODAS steering committee,,https://scholar.google.ca/citations?user=4vDlk...,map@sodas.ku.dk,Morten Axel Pedersen,,,,,...,,3474.0,2595.0,28.0,38.0,The Ontological Turn: An Anthropological Expos...,"M Holbraad, MA Pedersen","Cambridge: Cambridge University Press, 2017",2017.0,408.0
2,Rebecca Adler-Nissen is Professor in Political...,SODAS steering committee,https://twitter.com/rebadlernissen?lang=da,https://scholar.google.dk/citations?user=lazTX...,ran@ifs.ku.dk,Rebecca Adler-Nissen,rebadlernissen,2013-05-22 20:37:23,Professor of Political Science • International...,6113.0,...,False,2675.0,2214.0,24.0,41.0,Stigma Management in International Relations: ...,R Adler-Nissen,"International Organization 68 (1), 143-176, 2014",2014.0,286.0
3,Sune Lehmann is a Professor of Complexity and ...,SODAS steering committee,https://twitter.com/suneman,https://scholar.google.com/citations?user=wvkU...,sljo@dtu.dk,Sune Lehmann,suneman,2008-09-07 22:10:37,One of the leading leaders of unusual methods,3028.0,...,False,6099.0,4310.0,30.0,49.0,Link communities reveal multiscale complexity ...,"YY Ahn, JP Bagrow, S Lehmann","Nature 466 (7307), 761-764, 2010",2010.0,1922.0
4,Anders Blok is Associate Professor in Sociolog...,SODAS steering committee,,,abl@soc.ku.dk,Anders Blok,,,,,...,,,,,,,,,,
5,Søren Kyllingsbæk is Professor in Cognitive Ps...,SODAS steering committee,,https://scholar.google.com/citations?user=TIMC...,sk@psy.ku.dk,Søren Kyllingsbæk,,,,,...,,2807.0,1277.0,24.0,35.0,A neural theory of visual attention: bridging ...,"C Bundesen, T Habekost, S Kyllingsbæk","Psychological review 112 (2), 291, 2005",2005.0,623.0
6,Robert Böhm is a Professor of Applied Social P...,SODAS steering committee,https://twitter.com/robert_bohm,,rb@psy.ku.dk,Robert Böhm,robert_bohm,2015-09-25 23:29:06,Professor of Applied Social Psychology and Beh...,869.0,...,False,,,,,,,,,
7,Andreas Bjerre-Nielsen ia Head of Studies at t...,Assistant Professor,https://twitter.com/andbjn,https://scholar.google.dk/citations?user=fRnm_...,abn@sodas.ku.dk,Andreas Bjerre-Nielsen,andbjn,2011-05-29 13:35:14,Asst. Prof. in Econonomics and Social Data Sci...,331.0,...,False,83.0,83.0,2.0,2.0,"Class attendance, peer similarity, and academi...","V Kassarnig, A Bjerre-Nielsen, E Mones, S Lehm...","PloS one 12 (11), e0187078, 2017",2017.0,50.0
8,Frederik Hjorth is Assistant Professor at the ...,Assistant Professor,https://twitter.com/fghjorth,https://scholar.google.com/citations?user=gRyj...,fh@ifs.ku.dk,Frederik Hjorth,fghjorth,2011-07-19 09:01:42,TT assistant professor at @polscicph @uni_cope...,3274.0,...,False,192.0,189.0,7.0,6.0,Who benefits? Welfare chauvinism and national ...,F Hjorth,"European Union Politics, 2015",2015.0,55.0
9,Kristoffer Albris is an Assistant Professor at...,Assistant Professor,,,kristoffer.albris@sodas.ku.dk,Kristoffer Albris,,,,,...,,,,,,,,,,


Now, with the sodas people in a `DataFrame`, we can easily subset the data. For example, if we want to only look at people with twitter handles, we can do conditional selection. Note the use of `~` instead of `not` - this tells pandas to compare each row rather than as a whole.

Similarly, instead of `and` use `&` and instead of `or` use `|`. 

In [50]:
sodas_twitter_people = sodas_people_df[~(pd.isna(sodas_people_df["twitter"]))]

sodas_twitter_people

Unnamed: 0,description,role,twitter,google_scholar,mail,name,twitter_handle,twitter_created_at,twitter_bio,twitter_followers_n,...,twitter_verified,gs_cites_all_time,gs_cites_since_2015,gs_h_index_all_time,gs_i10_index_all_time,gs_most_cited_title,gs_most_cited_authors,gs_most_cited_outlet,gs_most_cited_year,gs_most_cited_cites
0,David Dreyer Lassen is the Director of SODAS a...,SODAS steering committee,https://twitter.com/daviddlassen,https://scholar.google.dk/citations?user=aRBQc...,david.dreyer.lassen@econ.ku.dk,David Dreyer Lassen,daviddlassen,2013-11-23 11:36:43,"Chairman, Independent Research Fund Denmark (D...",1138.0,...,False,3991.0,2170.0,21.0,32.0,"Fiscal transparency, political parties, and de...","JE Alt, DD Lassen","European Economic Review 50 (6), 1403-1439, 2006",2006.0,695.0
2,Rebecca Adler-Nissen is Professor in Political...,SODAS steering committee,https://twitter.com/rebadlernissen?lang=da,https://scholar.google.dk/citations?user=lazTX...,ran@ifs.ku.dk,Rebecca Adler-Nissen,rebadlernissen,2013-05-22 20:37:23,Professor of Political Science • International...,6113.0,...,False,2675.0,2214.0,24.0,41.0,Stigma Management in International Relations: ...,R Adler-Nissen,"International Organization 68 (1), 143-176, 2014",2014.0,286.0
3,Sune Lehmann is a Professor of Complexity and ...,SODAS steering committee,https://twitter.com/suneman,https://scholar.google.com/citations?user=wvkU...,sljo@dtu.dk,Sune Lehmann,suneman,2008-09-07 22:10:37,One of the leading leaders of unusual methods,3028.0,...,False,6099.0,4310.0,30.0,49.0,Link communities reveal multiscale complexity ...,"YY Ahn, JP Bagrow, S Lehmann","Nature 466 (7307), 761-764, 2010",2010.0,1922.0
6,Robert Böhm is a Professor of Applied Social P...,SODAS steering committee,https://twitter.com/robert_bohm,,rb@psy.ku.dk,Robert Böhm,robert_bohm,2015-09-25 23:29:06,Professor of Applied Social Psychology and Beh...,869.0,...,False,,,,,,,,,
7,Andreas Bjerre-Nielsen ia Head of Studies at t...,Assistant Professor,https://twitter.com/andbjn,https://scholar.google.dk/citations?user=fRnm_...,abn@sodas.ku.dk,Andreas Bjerre-Nielsen,andbjn,2011-05-29 13:35:14,Asst. Prof. in Econonomics and Social Data Sci...,331.0,...,False,83.0,83.0,2.0,2.0,"Class attendance, peer similarity, and academi...","V Kassarnig, A Bjerre-Nielsen, E Mones, S Lehm...","PloS one 12 (11), e0187078, 2017",2017.0,50.0
8,Frederik Hjorth is Assistant Professor at the ...,Assistant Professor,https://twitter.com/fghjorth,https://scholar.google.com/citations?user=gRyj...,fh@ifs.ku.dk,Frederik Hjorth,fghjorth,2011-07-19 09:01:42,TT assistant professor at @polscicph @uni_cope...,3274.0,...,False,192.0,189.0,7.0,6.0,Who benefits? Welfare chauvinism and national ...,F Hjorth,"European Union Politics, 2015",2015.0,55.0
10,Friedolin Merhout is an Assistant Professor in...,Assistant Professor,https://twitter.com/fmerhout,https://scholar.google.com/citations?user=LexX...,fmerhout@sodas.ku.dk,Friedolin Merhout,fmerhout,2015-09-04 16:18:02,Assistant Professor of Sociology @uni_copenhag...,664.0,...,False,332.0,330.0,3.0,2.0,Exposure to opposing views on social media can...,"CA Bail, LP Argyle, TW Brown, JP Bumpus, H Che...",Proceedings of the National Academy of Science...,2018.0,306.0
11,Gregory Eady is an Assistant Professor in the ...,Assistant Professor,https://twitter.com/GregoryEady,,gregory.eady@sodas.ku.dk,Gregory Eady,GregoryEady,2010-06-15 01:47:01,Assistant professor at the University of Copen...,1031.0,...,False,,,,,,,,,
23,Kelton Ray Minor is a PhD student at SODAS and...,PhD Student,https://twitter.com/keltonminor,https://scholar.google.com/citations?user=sFys...,kmi@samf.ku.dk,Kelton Ray Minor,keltonminor,2013-12-17 22:33:58,Social and Behavioral Data Scientist. Research...,254.0,...,False,5.0,5.0,2.0,0.0,Inferring transportation mode from smartphone ...,"A Bjerre-Nielsen, K Minor, P Sapieżyński, S Le...","PloS one 15 (7), e0234003, 2020",2020.0,2.0
30,Snorre Ralund has a Bachelor and Master of Soc...,Research Assistant,https://twitter.com/SnorreRalund,,jser@econ.ku.dk,Snorre Ralund,SnorreRalund,2013-06-11 07:49:12,,85.0,...,False,,,,,,,,,


In [51]:
accidents_alzheimer_2016 = us_deaths_2016[us_deaths_2016["Cause Name"].isin(["Unintentional injuries", "Alzheimer's disease"])]

accidents_alzheimer_2016

NameError: name 'us_deaths_2016' is not defined