In [17]:
import pandas as pd
import plotly.express as px

In [18]:
# Reading the cleaned and merged dataset
df = pd.read_csv('Project_datasets_merged_filtered.csv', index_col=None)

#### Introduction
In this part, the main focus is the differences in public transport use between different age groups, and how the covid pandemic may have affected these differences.  
Indeed, different age groups may have been affected differently by the covid pandemic, depending on different other factors.    
Of course, factors linked to age are not independent from other factors studied in this project : for example older people are less likely to be employed than people between 25 and 34 years old.  


In [19]:
# Creation of a new column indicating the gap between the public transport
# paritcipation of a certain group and the mean value

df['gap with mean']=0
for i in range(0, df.shape[0]):
    ref_year = df.loc[i, 'Topic']
    ref_value = df[(df['Personal characteristics'] == 'Total persons') & (df['Topic'] == ref_year)].iloc[0, 3]
    df.loc[i, 'gap with mean'] = df.loc[i, 'Use of public transport'] - ref_value

df

Unnamed: 0,Personal characteristics,Topic,Traffic participation,Use of public transport,gap with mean
0,Age: 12 to 17 years,2010,82.1,10.0,3.2
1,Age: 12 to 17 years,2011,82.8,9.8,2.7
2,Age: 12 to 17 years,2012,82.1,8.9,2.2
3,Age: 12 to 17 years,2013,82.0,9.8,2.8
4,Age: 12 to 17 years,2014,83.3,8.9,2.2
...,...,...,...,...,...
450,Total persons,2018,82.9,8.6,0.0
451,Total persons,2019,81.9,8.6,0.0
452,Total persons,2020,73.8,3.9,0.0
453,Total persons,2021,78.1,4.2,0.0


In [20]:
# Creation of a column to compare the level of public transport participation
# of a certain year and for a certain group to the participation level 
# of 2018 for the same group

df['comparison to 2018 level']=0
for i in range(df.shape[0]):
    ref_group = df.loc[i, 'Personal characteristics']
    if df[(df['Personal characteristics'] == ref_group) & (df['Topic'] == 2018)].shape[0] == 0:
        df.loc[i, 'comparison to 2018 level'] = 'NaN'
    else:
        ref_value = df[(df['Personal characteristics'] == ref_group) & (df['Topic'] == 2018)].iloc[0, 3]
        comparison_value = df.loc[i, 'Use of public transport']
        df.loc[i, 'comparison to 2018 level'] = comparison_value - ref_value

df

Unnamed: 0,Personal characteristics,Topic,Traffic participation,Use of public transport,gap with mean,comparison to 2018 level
0,Age: 12 to 17 years,2010,82.1,10.0,3.2,-1.7
1,Age: 12 to 17 years,2011,82.8,9.8,2.7,-1.9
2,Age: 12 to 17 years,2012,82.1,8.9,2.2,-2.8
3,Age: 12 to 17 years,2013,82.0,9.8,2.8,-1.9
4,Age: 12 to 17 years,2014,83.3,8.9,2.2,-2.8
...,...,...,...,...,...,...
450,Total persons,2018,82.9,8.6,0.0,0.0
451,Total persons,2019,81.9,8.6,0.0,0.0
452,Total persons,2020,73.8,3.9,0.0,-4.7
453,Total persons,2021,78.1,4.2,0.0,-4.4


In [21]:
# Creation of a column to compare in percentages the level of public transport participation
# of a certain year and for a certain group to the participation level 
# of 2018 for the same group 

df['comparison to 2018 level (%)']=0
for i in range( df.shape[0]):
    ref_group = df.loc[i, 'Personal characteristics']
    if df[(df['Personal characteristics'] == ref_group) & (df['Topic'] == 2018)].shape[0] == 0:
        df.loc[i, 'comparison to 2018 level (%)'] = 'NaN'
    else:
        ref_value = df[(df['Personal characteristics'] == ref_group) & (df['Topic'] == 2018)].iloc[0, 3]
        comparison_value = df.loc[i, 'Use of public transport']
        df.loc[i, 'comparison to 2018 level (%)'] = ((comparison_value - ref_value)/ref_value)*100

In [22]:
# Creation of a column to compare in percentages the level of public transport participation
# of a certain year and for a certain group to the participation level 
# of 2020 for the same group 

df['comparison to 2020 level (%)']=0
for i in range( df.shape[0]):
    ref_group = df.loc[i, 'Personal characteristics']
    if df[(df['Personal characteristics'] == ref_group) & (df['Topic'] == 2020)].shape[0] == 0:
        df.loc[i, 'comparison to 2020 level (%)'] = 'NaN'
    else:
        ref_value = df[(df['Personal characteristics'] == ref_group) & (df['Topic'] == 2020)].iloc[0, 3]
        comparison_value = df.loc[i, 'Use of public transport']
        df.loc[i, 'comparison to 2020 level (%)'] = ((comparison_value - ref_value)/ref_value)*100

In [23]:
# Creation of a dataset only containing the public transport participation
# for the "Total persons" group, i.e all categories regrouped

mask = df['Personal characteristics'].str.startswith('Total')
df_total = df[mask]

In [24]:
# Creation of a dataset focusing on age groups, with one dataset also having the total group for comparison
# The dataset df_age2 also contains the value across all groups for comparison

mask = df['Personal characteristics'].str.startswith('Age')
df_age = df[mask]
df_age2 = pd.concat([df_total, df_age])

In [25]:
# Plot of the use of public transports for different age groups
# across the years

fig = px.line(data_frame=df_age2, x="Topic", y="Use of public transport", color='Personal characteristics', title = 'Use of public transport depending on age across the years')

# Highlighting the plot for the "Total persons" demographic group
fig.update_traces(line_width = 3, selector=('Personal characteristics'=='Total persons'))
fig.update_traces(opacity = 0.85)
fig.update_traces(opacity = 1, selector=('Personal characteristics'=='Total persons'))
fig.update_traces(line_dash = 'longdashdot', selector=('Personal characteristics'=='Total persons'))
fig.show()

#### Observations :
The drop of traffic during covid is sharper for younger categories (age < 35 years) compared to the other age groups. Maybe this can be explained by the closing of schools and universities. Maybe younger workers also tend to use remote working more than older ones.  
Since older age groups tended to use public transport less before covid, the decrease in participation in these groups is also less important, but maybe the decrease relative to their pre-covid levels is more important.  
  
It is noticeable that people between 25 and 34 years old used public transport more than people 12 to 17 years old before the pandemic, whereas after the pandemic their levels of use of public transport are quite similar. This can maybe be explained by an increase in the use of remote working.  
Looking at the older data from 2010, the general trend seems to be a stagnation from 2010 to 2017, with a noticeable rise in public transport participation between 2017 and 2018, however since this dates correspond to the junction between two datasets this variation can also come from a change in the methodology or data gathering for the dataset.  
  
The traffic have yet to recover to their pre-covid levels, however the post-covid trends differ. Some groups, especially the younger age groups, seem to recover more quickly than older age groups.  
  
Regarding teleworking, in [1] it is shown on the basis of a survey that people of age 35-54 years make up 52% of teleworkers in the Netherlands but only 43% of the population. Thus, telework can't completely explain the drop in public transport use for younger working categories (18 to 34 years old). Beside, graphs below will also show that the relative drop compared to pre-covid levels is actually smaller than for other categories. Of course, teleworking can still provide an explanation for the 12-17 age group catching up with the 25-34 age group in terms of public transport participation after the pandemic. However, the public transport participation for both these groups were already quite similar, so it is hard to explain this variation purely with causes linked to the pandemic.   

[1] Ton, Danique, Koen Arendsen, Menno De Bruyn, Valerie Severens, Mark Van Hagen, Niels Van Oort, et Dorine Duives. « Teleworking during COVID-19 in the Netherlands: Understanding Behaviour, Attitudes, and Future Intentions of Train Travellers ». Transportation Research Part A: Policy and Practice 159 (mai 2022): 55‑73. https://doi.org/10.1016/j.tra.2022.03.019.

In [26]:
# Creation of a graph showing the gap between the value of a group
# and the mean value across the years

px.line(data_frame=df_age, x="Topic", y="gap with mean", color='Personal characteristics', title = 'Gap with the mean value across the years')

#### Observations :
What is interesting to note is that for people 35 years old and older, the gap with the mean is quite constant during the pandemic, and is even reduced for some categories. Post-covid gaps are generally less important but still negative. Thus, we can imagine that people aged 35 and more were less impacted by the pandemic than other categories. This can maybe be explained by the fact that a lot of younger persons, who make up the bulk of public transport traffic, stopped using public transport because of schools and universities closing, whereas certain necessary workplaces were still open.
 

In [27]:
# Creation of a graph showing the
# comparison between 2022 and 2018 public transport participation levels

px.bar(data_frame = df_age2[(df_age2['Topic'] == 2022)], x='Personal characteristics',
        y='comparison to 2018 level', color='Personal characteristics',
        title = 'Comparison of 2022 public transport participation to the 2018 levels')

#### Observations :
In terms of raw participation, the group with the sharpest drop is the 18 to 24 years age group. As mentionned before, this can maybe be explained by the fact during the pandemic, many universities were closed, so universities implemented more online courses, which may have reduced the transportation needs of students after the pandemic. Also, maybe people preferred using individual transport means (bikes, cars) during the pandemic to avoid restrictions and kept this habit afterwards.
The second most impacted group are people from 25 to 34 years old. Maybe this can be explained by an increase in remote work after the pandemic  
  
However, we have to keep in mind that the variation is bigger in these groups because they are already the groups with the highest participation even before covid. Thus, comparison with other groups will maybe be more easy using percentages

In [28]:
# Creation of a graph showing the
# difference between 2022 and 2018 levels relative to the 2018 level

px.bar(data_frame = df_age2[(df_age2['Topic'] == 2022)], x='Personal characteristics',
        y='comparison to 2018 level (%)', color='Personal characteristics',
        title = 'Comparison of 2022 public transport participation to the 2018 levels (%)')

In the above graph we compare 2022 (post-covid) levels to 2018 (pre-covid) levels  
  
In terms or percentages, the average variation (variation for the total) is - 29%.  
Even if the variation was higher in terms of raw numbers for the younger age groups, proportionnaly the drop was bigger for age groups over 35 years old.
One explanation could be that, since these groups were already using public transports less than other groups, people from these older age groups were more likely to stop taking public transport, or at least decrease significantly their use of public transport. Furthermore, maybe these groups have more alternatives to public transport than younger age groups, for example access to a car.  
Thus we can make an hypothesis : the younger categories contain more regular users of public transport, whose use of public transport haven't decreased that much, whereas older age groups are composed of occasionnal users of public transport, who could switch to another mode of transportation more easily. This interpretation seems to be backed up by the fact that pesons between 12 to 17 years, who have few alternatives to public transport, at least for long distances, have known the smallest relative decrease in use of public transport.  
  


In [29]:
# Creation of a graph showing the
# difference between 2020 and 2018 levels relative to the 2018 level
px.bar(data_frame = df_age2[(df_age2['Topic'] == 2020)], x='Personal characteristics',
        y='comparison to 2018 level (%)', color='Personal characteristics',
        title = 'Comparison of 2020 public transport participation to the 2018 levels (%)')

In the above graphs we compare 2020 levels (during pandemic) to pre-covid levels.  
    
It is noticeable that gaps between the total persons category and other categories are not as big as when analyzing the levels after the pandemic. 
Thus, maybe age played a more important role in how different groups resumed or changed their use of public transport after the pandemic, rather than having a huge influence on how different groups beahved during the pandemic itself.  
Besides, since younger groups have known a less significant drop (relative to their pre-covid levels) during the pandemic, it was easier for their public transport participation to recover after covid.


In [30]:
# Creation of a graph showing the
# difference between 2022 and 2020 levels relative to the 2020 level
px.bar(data_frame = df_age2[(df_age2['Topic'] == 2022)], x='Personal characteristics',
        y='comparison to 2020 level (%)', color='Personal characteristics',
        title = 'Comparison of 2022 public transport participation to the 2020 levels (%)')

When comparing 2022 levels to 2020 levels in above graph, several things appear.  
  
Even though people younger than 24 years old were less impacted by the pandemic (their public transport participation is closest to their pre covid levels, relatively, comapred to other age groups), the growth rate of their participation between 2020 and 2022 is a bit below average.  
One explanation can be considered. The sharper the drop during covid, the easier it is for a category to have a high growth rate after, given its low participation. This is probably true for the oldest age group : their participation level dropped to 1.1 during the pandemic, meaning even a small increase in their participation (+0.7%) can create the impression of a very swift recovery. 
  
The public transport use of people 50 to 64 years old is recovering more slowly than average, even though the relative decrease during covid was more important. On the contrary people from 25 to 34 years recover more quickly even though their drop in use was less severe than people from 50 to 64 years. One explanation previously considered was the difference in car ownership, but another factor could be that older working people simply reduced their travels (use of teleworking, less working hours, or even maybe increased unemployment leading to less need for transportation)

### In conclusion
Even though at first the age groups most affected seemed to be those with the highest participation before covid (that is to say younger age groups, between 12 and 34 years old), the graphs created showed that even though the raw decrease was more important than with other age groups, when considering the drop in use relative to pre-covid levels, younger age groups were less affected than older ones : their relative decrease in public transport use is lower compared to older age groups both during and after the pandemic, with the exception of the 25-34 years old group, which suffered a sharp drop during the pandemic but recovered faster compared to other groups.  
  
Overall, age is an important factor when assessing public transport use. The most noticeable effect is that people from 18 to 24 years use public transport far more than any other category. This trend seems to continue even after the pandemic. Of course, there are differences between other age groups too (younger groups tend to use public transport more in general, when considering people over 18), but the levels of people from 18 to 24 years old are far above the rest.
  
It is hard to say whether the post-pandemic trend, which shows a quick rise in public transport participation for all age groups, will continue and allow the public transport use levels to recover, or if the public transport use levels will stabilize at a level lower than before the pandemic. Even if it is hard to make any definitive conclusion regarding how long the pandemic's effect will affect public transport use levels, it is important to notice that these trends are far less predictable than the trends before pandemic, which were more stable, presumably allowing public transport companies to forecast the demand more accurately.
  
Overall, it doesn't seems like the pandemic has changed the structure of public transport use for age groups, but the pandemic may have solidified some existing trends : younger people use public transport more and older people use it less.
