# Imports

In [1]:
import pandas as pd
import requests
from src.utils import run_query

# Task description

Since we want to get a better understanding of your Python skills and how you approach a task, we ask you to send us a solution for this coding task.

Imagine we have a new customer who provides us a small dataset and wants to get some insights about his fleet.

Your task is to perform a little data exploration on the provided data and then apply simple anomaly detection techniques. <br>
Support your findings with plots, when possible and when it makes sense. <br>
Do not forget to comment your code and write a final conclusion when you are done.

As a dataset for this small task, we use data of trains operating in Finland, which is accesible through an API (https://www.digitraffic.fi/en/railway-traffic/).

For a fast start, we provide some examples how you can query data: 

In [2]:
# live data for the current trains
live_api = 'https://rata.digitraffic.fi/api/v1/live-trains/'

# data for one train and one particular date
train_day_api = 'https://rata.digitraffic.fi/api/v1/trains/2021-02-01/4'

# data for one particular day for all trains
day_api = 'https://rata.digitraffic.fi/api/v1/trains/2021-02-01'

In [3]:
data = pd.read_json(requests.get(live_api).text)
data.head()

Unnamed: 0,trainNumber,departureDate,operatorUICCode,operatorShortCode,trainType,trainCategory,commuterLineID,runningCurrently,cancelled,version,timetableType,timetableAcceptanceDate,timeTableRows
0,11,2022-03-21,10,vr,IC,Long-distance,,False,False,282222818949,REGULAR,2021-11-05T10:07:11.000Z,"[{'stationShortCode': 'HKI', 'stationUICCode':..."
1,12,2022-03-21,10,vr,IC,Long-distance,,False,False,282223022949,REGULAR,2021-11-05T10:07:11.000Z,"[{'stationShortCode': 'JNS', 'stationUICCode':..."
2,28,2022-03-21,10,vr,IC,Long-distance,,False,False,282222766031,REGULAR,2021-11-05T10:07:11.000Z,"[{'stationShortCode': 'OL', 'stationUICCode': ..."
3,29,2022-03-21,10,vr,IC,Long-distance,,False,False,282222974675,REGULAR,2021-11-05T10:07:11.000Z,"[{'stationShortCode': 'HKI', 'stationUICCode':..."
4,37,2022-03-21,10,vr,IC,Long-distance,,False,False,282222232433,REGULAR,2021-11-05T10:07:11.000Z,"[{'stationShortCode': 'HKI', 'stationUICCode':..."


# Your Approach

Perform the following steps:
1. Load the data from the API (use at least one month of data)
2. Perform data exploration to get a better understanding of the data, which information is it providing and describe what you find. 
    - See this step also as preparation for the next one
    - Consider that the column 'timeTableRows' contains further, embedded information
    - Important aspects could also be cancellations, delays and their causes


3. Try to find anomalies taking into account: 
    - Number of stations on the way
    - Total time from entry point to final destination

In [4]:
query = """
    {
      currentlyRunningTrains(where: {operator: {shortCode: {equals: "vr"}}}) {
        trainNumber
        departureDate
        trainLocations(where: {speed: {greaterThan: 30}}, orderBy: {timestamp: DESCENDING}, take: 1) {
          speed
          timestamp
          location
        }
      }
    }
"""

results = run_query(query)

In [5]:
results

{'currentlyRunningTrains': [{'trainNumber': 265,
   'departureDate': '2022-03-21',
   'trainLocations': [{'speed': 40,
     'timestamp': '2022-03-22T00:18:01Z',
     'location': [23.12178, 63.833935]}]},
  {'trainNumber': 266,
   'departureDate': '2022-03-21',
   'trainLocations': [{'speed': 129,
     'timestamp': '2022-03-22T00:18:10Z',
     'location': [23.174961, 61.960384]}]},
  {'trainNumber': 269,
   'departureDate': '2022-03-21',
   'trainLocations': [{'speed': 120,
     'timestamp': '2022-03-22T00:18:07Z',
     'location': [23.03883, 63.086845]}]},
  {'trainNumber': 273, 'departureDate': '2022-03-21', 'trainLocations': None},
  {'trainNumber': 274,
   'departureDate': '2022-03-21',
   'trainLocations': [{'speed': 127,
     'timestamp': '2022-03-22T00:18:00Z',
     'location': [22.778595, 63.590073]}]},
  {'trainNumber': 276,
   'departureDate': '2022-03-21',
   'trainLocations': [{'speed': 96,
     'timestamp': '2022-03-22T00:18:13Z',
     'location': [24.569452, 64.089966]}]},

# Final conclusion

Please write 2-3 sentences about your findings and how you would interpret them/explain them to a customer.