# Scraping [Transportstyrelsen](https://www.transportstyrelsen.se/sv/vagtrafik/fordon/aga-kopa-eller-salja-fordon/import-och-export-av-fordon/fordonsimport-och-ursprungskontroll/)

## Transportstyrelsen

See the [README](README.md) on why, what and how.

## Case handling duration

It follows that if more cases are handled per day, the processing time goes down.

In [3]:
import pandas as pd

def read_data():
    return pd.read_csv("transportstyrelsen_data.csv")

def sanitize(df):
    df['Date'] = pd.to_datetime(df['Date'], format='ISO8601')
    df['Evaluating cases'] = pd.to_datetime(df['Evaluating cases'])

    return df

df = sanitize(read_data())
df['Week'] = df['Date'].dt.strftime('%Y-%U')
grouped_by_week = df.groupby('Week', as_index=True)

### Processing time evolution per week

In [10]:
processing_aggregate_name_mapping = {'mean': 'Mean', 'min': 'Fastest', 'max': 'Slowest'}

In [11]:
# Calculate mean, least and most processing time fluctuations in a week
grouped_by_week['Processing time'].agg(['mean', 'min', 'max']).round({'mean': 2}).rename(columns=processing_aggregate_name_mapping)

Unnamed: 0_level_0,Mean,Fastest,Slowest
Week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-49,5.48,5.29,5.92
2024-50,5.43,5.43,5.43


### Dates handled

In [4]:
# This shows us how many days' worth of cases were handled on a given day.
df['Progressed cases'] = df['Evaluating cases'].diff().dt.days

In [5]:
grouped_by_week['Progressed cases'].agg(['sum']).rename(columns={"sum": "Processed dates"})

Unnamed: 0_level_0,Processed dates
Week,Unnamed: 1_level_1
2024-49,6.0
2024-50,3.0
