![logo-big_carla-data-analytics-platform.png](attachment:logo-big_carla-data-analytics-platform.png)
DnA team welcomes you to your first personal jupyter notebook!


In this session (if you continue reading this notebook) we will first show you the basics of Jupyter the IDE (Integrated Development Environment), and Python the programming language that you use inside it. At the end of learning part you will also have opportunity to participate in the competition (by clicking on "**Start Task**" button at the end of this notebook).

Please note that **participation in the competition will be possible only from 12.07.2021 16:00 CET till 13.07.2021 13:00 CET**, on the other side, learning and trying you can do at any time :)

Be aware that for solving the task you will have all you need to know in this document - so do not be scared :)

You can imagine Jupyter notebook as just a normal notebook/word document where you can write your notes, reminders, visualization, links. But additionally you can mix those with the so called cells of code, that you can execute step by step - so you see the perfect way of explaining what you are doing even for somebody who does not understand the code completely.

To run a cell, click on it and then press the | ▶ Run | button above or simply use **shift + enter** on your keyboard.

**Trivia**
Why did we select Python for this exercise?
Because it is just perfect - it can be used everywhere, for every purpose, is easily readable and understandable and finally - is extremely extensible with libraries and third party modules. We as organization recognize the value of data, and this is the main reason why we will focus in this exercise on pandas library (https://pandas.pydata.org/) and some basic data analysis tasks pandas is offering.

# Warm up task

So lets start!

Click on the cell below and run it either with | ▶ Run | button above or simply use **shift + enter** on your keyboard.
Output/result will be shown under the cell for the command that you were executing and will be read only.
At any time you can add additional cells or simply add code in the existing cell. If you want to comment the code just put **#** in front of the code.

In [None]:
###### Run me ######
from IPython.display import Image
Image(filename='/tmp/assets/congrats.jpeg')

# Data transformation with pandas
Pandas is the most popular tool for data scientists to analyze the data. 
Below you will find few examples of how, with the use of pandas, get the data, display the data, and transform it to your needs. In order to use pandas you have to tell Python to import pandas library with **import** command and define name under which you will refer to it in the future - in this case **pd**.

## Get the data
The example data contains the number of delivered cars from 2015 to 2020 for selected German car brands:
**Audi**, **BMW**, **Mercedes Benz**, **Porsche**.
In order to make it more fun we have made data available over API. So lets now get the data from API using Python and pandas and then store it into the internal Python storage (Dataframe) with the name **data**.

In [None]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd  # import pandas library for the data analysis purposes
import requests  # import requests library for getting the data from API
import io  # import the io module to convert the response to the file format known by pandas read_csv() method

# get the data with a GET request to the specific endpoint
response = requests.get(url="DNA_URL_HTTPS_DEV/api/warmup-data", 
                        headers={"Content-Type": "application/octet-stream"},
                        verify=False)
text = response.content.decode("utf-8")
data = pd.read_csv(io.StringIO(text), sep=",", index_col=0)

## Display the data

If you would like to display the content of this Dataframe called **data** simply type its name and execute the following cell.

In [None]:
data

Now you know which columns are there - in case you want to display just few of them you can use double square bracket **[[]]** with **column_name** inside it. Below cell will display only **Car brand** column, try it...

In [None]:
data[["Car brand"]]

Sometimes dataset is too big to scroll through. So using the method **head(n)** we can display first **n** rows of the data. Following command will display only first 5 rows, try it...

In [None]:
data.head(5)

Now you may be already guessing it - using the method **tail(n)** you can display last **n** rows of the data, try this also...

In [None]:
data.tail(5)

## Transform the data

Usually you are interested in finding the highest or lowest values - so you have guessed right. There is also a possibility to sort :) Using the method **sort_values(column_name)** you can sort_values by the specified column in the ascending order. 
For full list of parameters including how to change the sorting order, check the [documentation for sort_values()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html).
You will probably say - but this I can do also in Excel - true but remember Excel has limits and predefined possibilities, Python is limitless, and even if you are ever missing something you can write your own library and publish it as FOSS component :)
So lets sort on column **Deliveries number**, try it...

In [None]:
data.sort_values("Deliveries number")

Using the method **groupby(column_name)** ( [groupby-docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) ) we can groupby and later aggregate the data on a specified column such as **count()** ( [groupby-count-docs](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.count.html) ) to count the number of records per each group or **sum()** ( [groupby-sum-docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.sum.html) ) to sum the values in each group. 

In the below example we will use the **sum()** aggregation method to sum the number of delivered cars in the last five years for each car brand.

In [None]:
data.groupby("Car brand").sum()[['Deliveries number']]

In pandas we can chain methods to get the desired result. Combining the above methods will give us the car brand with the highest number of delivered cars in the last 5 years (and yes, we were cheating a little by removing one brand we do not want to mention here - but remember they sell more units but we sell lot more quality and luxury).

In [None]:
# Notice the tail(1) at the end to get the last value as sort_values() method sorts in ascending order by default.
data.groupby("Car brand").sum().sort_values("Deliveries number")[["Deliveries number"]].tail(1)

***

## Competition Task

So congratulations - By now you know all Python & pandas commands that you will need to complete the task!
Further to make your life easier, we have already provided the code that you will need to run to prepare for the task. As in the previous case, we used pandas and API calls to retrieve and store the test data set in variable called **data_task**.
The dataset contains the number of CaaS clusters across all regions, over various periods of time. Analyze this dataset - it will help you to solve the task faster 😊

## Get the data

In [None]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd  # import pandas library for the data analysis purposes
import requests  # import requests library for getting the data from API
import io  # import the io module to convert the response to the file format known by pandas read_csv() method

# get the data with a GET request to the specific endpoint
response_task = requests.get(url="DNA_URL_HTTPS_DEV/api/task-data", 
                        headers={"Content-Type": "application/octet-stream"},
                        verify=False)
text_task = response_task.content.decode("utf-8")
data_task = pd.read_csv(io.StringIO(text_task), sep=",")
data_task

## Perform necessary transformations to get the answer

# So now finally your **Task** for the competition and a chance to win a cool AI board 😊



In [None]:
# Code for the task given above goes here.