# \#72 City travel
If you won't walk or cycle, then are you more likely to drive or take public transportation? If we restrict the question to how people get to work, then we can probably get clearer results, and perhaps even learn something about how different cities operate.

A recent article in the Economist (https://www.economist.com/interactive/2024-walkable-cities) summarized a recent paper in the journal Science by researchers Rafael Prieto-Curiel and Juan P. Ospina. Their paper, "The ABC of mobility" (https://www.sciencedirect.com/science/article/pii/S0160412024001272), looks at the "modal share" for three different types of transportation in a city – active (walking/biking), public transportation, and driving. The researchers collected many studies about transportation and work, from a wide variety of cities, in order to understand each city. They then looked at other factors to consider for a given city, such as income, population, and location in the world.

I should add that the "ABC" in the title refers to the three types of mobility that the researchers discuss:
- A stands for "active," and includes walking and biking
- B stands for "bus," and is the overall category for public transportation
- C stands for "car," and describes people who drive to work

They found, not surprisingly, that cities in North America rely on cars more than in the rest of the world.

Even in New York, a city in which people famously walk and use public transportation quite a lot, many people still commute by car. A proposal to charge drivers extra if they go to the most-congested parts of Manhattan during peak hours was about to go into effect on June 30th, but New York Governor Kathy Hochul suspended the plan earlier this month (https://www.nytimes.com/2024/06/05/nyregion/congestion-pricing-pause-hochul.html?unlocked_article_code=1.2k0.xMmn.a7Q2Lxo3J885&smid=url-share), reducing the incentive to use public transportation while simultaneously adding a $1 billion hole to the public transportation budget.

## Data and seven questions
The data is available at https://github.com/rafaelprietocuriel/ModalShare/blob/main/ModalShare.csv

## Challenges
The learning goal include multi indexes, grouping, pivot table, and plotting among others.

- Import the CSV file into a data frame. We'll want a three-part multi-index from the region, Country, and City columns.
- How many distinct cities were represented in this research? Which 10 cities were surveyed the most times?


In [23]:
import pandas as pd

df = pd.read_csv('ModalShare.csv')
df = df.set_index(['region', 'Country', 'City'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,CityID,ObsID,year,LastObservation,metro_names,continent,subregion,state_name,state_abbr,population,...,Walking,Cycling,Motorbykes,Active,Bus,Car,IncomeGroup,DataSource,DataLink,GDPPP 2022
region,Country,City,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
Europe,Germany,Aachen,M10001,ID0005,2003,NO,Aachen,Europe,Western Europe,,,256605,...,0.240000,0.100000,,0.340000,0.140000,0.520000,High income,The European Platform on Mobility Management,https://epomm.eu/,52745.75571
Europe,Germany,Aachen,M10001,ID0004,2017,YES,Aachen,Europe,Western Europe,,,256605,...,0.297030,0.108911,,0.405940,0.128710,0.465350,High income,Mobilität in Deutschland − MiD,https://www.aachen.de/de/stadt_buerger/verkehr...,52745.75571
Europe,Denmark,Aarhus,M10002,ID0008,2004,YES,Aarhus,Europe,Northern Europe,,,331000,...,0.072917,0.281250,,0.354167,0.197917,0.447917,High income,Urban Audit,https://ec.europa.eu/eurostat/web/gisco/geodat...,67967.38187
Europe,United Kingdom,Aberdeen,M10003,ID0009,2008,YES,Aberdeen,Europe,Northern Europe,,,211000,...,0.297030,0.019802,,0.316830,0.178220,0.504950,High income,The European Platform on Mobility Management,https://epomm.eu/,48866.60396
North America,United States,Abilene,M10004,C001,2019,YES,"Abilene, TX Metro Area",North America,Northern America,Texas,TX,164878,...,,,,0.009901,0.002200,0.987899,High income,US CENSUS DATA - Commuting,https://censusreporter.org/,81695.18707
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
East Asia and Pacific,China,Zhuzhou,M10877,ID0745,2018,YES,Zhuzhou,Asia,Eastern Asia,,,4890000,...,,,,0.420000,0.240000,0.340000,Upper middle income,EF CHINA,https://www.efchina.org/Reports-en/green-mobil...,12614.06099
East Asia and Pacific,China,Wenzhou,M10878,ID0746,2018,YES,Wenzhou,Asia,Eastern Asia,,,1500000,...,,,,0.340000,0.300000,0.360000,Upper middle income,EF CHINA,https://www.efchina.org/Reports-en/green-mobil...,12614.06099
East Asia and Pacific,China,Zhongshan,M10879,ID0747,2018,YES,Zhongshan,Asia,Eastern Asia,,,7600000,...,,,,0.560000,0.190000,0.250000,Upper middle income,EF CHINA,https://www.efchina.org/Reports-en/green-mobil...,12614.06099
East Asia and Pacific,China,Zibo,M10880,ID0748,2018,YES,Zibo,Asia,Eastern Asia,,,2100000,...,,,,0.450000,0.320000,0.230000,Upper middle income,EF CHINA,https://www.efchina.org/Reports-en/green-mobil...,12614.06099


In [33]:
%%timeit
df['CityID'].nunique()

94 μs ± 5.53 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1092 entries, ('Europe', 'Germany', 'Aachen') to ('East Asia and Pacific', 'China', 'Yantai')
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   CityID           1092 non-null   object 
 1   ObsID            1092 non-null   object 
 2   year             1092 non-null   int64  
 3   LastObservation  1092 non-null   object 
 4   metro_names      1092 non-null   object 
 5   continent        1092 non-null   object 
 6   subregion        1092 non-null   object 
 7   state_name       1092 non-null   object 
 8   state_abbr       1092 non-null   object 
 9   population       1092 non-null   int64  
 10  longitude        1092 non-null   float64
 11  latitude         1092 non-null   float64
 12  Walking          910 non-null    float64
 13  Cycling          910 non-null    float64
 14  Motorbykes       6 non-null      float64
 15  Active           1092 non-null   flo

Correction

In [26]:
df = pd.read_csv('ModalShare.csv', 
                 engine = 'pyarrow', # Use the pyarrow engine to read faster the csv file
                 index_col=['region', 'Country', 'City']) 
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,CityID,ObsID,year,LastObservation,metro_names,continent,subregion,state_name,state_abbr,population,...,Walking,Cycling,Motorbykes,Active,Bus,Car,IncomeGroup,DataSource,DataLink,GDPPP 2022
region,Country,City,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
Europe,Germany,Aachen,M10001,ID0005,2003,NO,Aachen,Europe,Western Europe,,,256605,...,0.240000,0.100000,,0.340000,0.140000,0.520000,High income,The European Platform on Mobility Management,https://epomm.eu/,52745.75571
Europe,Germany,Aachen,M10001,ID0004,2017,YES,Aachen,Europe,Western Europe,,,256605,...,0.297030,0.108911,,0.405940,0.128710,0.465350,High income,Mobilität in Deutschland − MiD,https://www.aachen.de/de/stadt_buerger/verkehr...,52745.75571
Europe,Denmark,Aarhus,M10002,ID0008,2004,YES,Aarhus,Europe,Northern Europe,,,331000,...,0.072917,0.281250,,0.354167,0.197917,0.447917,High income,Urban Audit,https://ec.europa.eu/eurostat/web/gisco/geodat...,67967.38187
Europe,United Kingdom,Aberdeen,M10003,ID0009,2008,YES,Aberdeen,Europe,Northern Europe,,,211000,...,0.297030,0.019802,,0.316830,0.178220,0.504950,High income,The European Platform on Mobility Management,https://epomm.eu/,48866.60396
North America,United States,Abilene,M10004,C001,2019,YES,"Abilene, TX Metro Area",North America,Northern America,Texas,TX,164878,...,,,,0.009901,0.002200,0.987899,High income,US CENSUS DATA - Commuting,https://censusreporter.org/,81695.18707
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
East Asia and Pacific,China,Zhuzhou,M10877,ID0745,2018,YES,Zhuzhou,Asia,Eastern Asia,,,4890000,...,,,,0.420000,0.240000,0.340000,Upper middle income,EF CHINA,https://www.efchina.org/Reports-en/green-mobil...,12614.06099
East Asia and Pacific,China,Wenzhou,M10878,ID0746,2018,YES,Wenzhou,Asia,Eastern Asia,,,1500000,...,,,,0.340000,0.300000,0.360000,Upper middle income,EF CHINA,https://www.efchina.org/Reports-en/green-mobil...,12614.06099
East Asia and Pacific,China,Zhongshan,M10879,ID0747,2018,YES,Zhongshan,Asia,Eastern Asia,,,7600000,...,,,,0.560000,0.190000,0.250000,Upper middle income,EF CHINA,https://www.efchina.org/Reports-en/green-mobil...,12614.06099
East Asia and Pacific,China,Zibo,M10880,ID0748,2018,YES,Zibo,Asia,Eastern Asia,,,2100000,...,,,,0.450000,0.320000,0.230000,Upper middle income,EF CHINA,https://www.efchina.org/Reports-en/green-mobil...,12614.06099


How many distinct cities were represented in this research? Which 10 cities were surveyed the most times?
On the face of it, this doesn't seem like a particularly challenging question. But when you consider that city names repeat themselves, and that some cities were surveyed multiple times, it becomes a bit trickier.

Fortunately, the survey gives each city a unique ID number, which we can grab from the CityID column in the data frame. Counting the number of unique values of CityID will tell us how many distinct cities were included.

One option would be to take the CityID column, run drop_duplicates on its elements, and then run the count method on the results:



In [31]:
%%timeit
df['CityID'].drop_duplicates().count() 

125 μs ± 3.97 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [32]:
%%timeit
len(df['CityID'].value_counts().index)

343 μs ± 38.1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


$\textbf{Which 10 cities were surveyed the most times}$

Here, it was pretty clear that I would want to use value_counts. But what would I run value_counts on? And how would I then use the results?

I actually was able to run value_counts on the index itself. Index objects aren't exactly Pandas series objects, but they're not exactly not series objects, either. We can often run series methods on them, including (as is the case here) where we have a multi-index.

I thus ran value_counts on df.index. The result of value_counts is always a series in which the index contains the unique values, sorted (by default) in descending order of frequency. I can thus grab the index of that set of results to find which cities were most (and least) popular. By running head on the series, I can even get the 10 most commonly referred to cities:


In [40]:
df.index.value_counts().head(10)

(Europe, Austria, Graz)                                   8
(Europe, Austria, Vienna)                                 6
(Latin America and Caribbean, Argentina, Buenos Aires)    6
(Europe, Norway, Oslo)                                    5
(Europe, Germany, Leipzig)                                5
(Europe, Germany, Dusseldorf)                             5
(Europe, Germany, Erfurt)                                 5
(Europe, Belgium, Ghent)                                  4
(Sub-Saharan Africa, South Africa, Cape Town)             4
(Europe, United Kingdom, London)                          4
dtype: int64