# <p align="center">Siemens Sales Forecast</p>

---

## <p align="center">*1 - Exploratory Data Analysis & Pre-processing*</p>

---

### 👥 **Team Members**
- **Ana Farinha** *(Student Number: 20211514)*  
- **António Oliveira** *(Student Number: 20211595)*  
- **Mariana Neto** *(Student Number: 20211527)*  
- **Salvador Domingues** *(Student Number: 20240597)*  

📅 **Date:** *April 1, 2025*  
📍 **Prepared for:** *Siemens*  

**GitHub Repo:** https://github.com/MGN19/Siemens-forecast

---

# ToC

<a class="anchor" id="top"></a>


1. [Import Libraries & Data](#1.-Import-Libraries-&-Data) <br><br>

2. [Data Wrangling](#2.-Data-Wrangling) <br><br>

3. [Data Exploration](#3.-Data-Exploration) <br><br>

# 1. Import Libraries & Data

In [8]:
import pandas as pd

import utils as u

pd.set_option('display.max_columns', None)

**Data**

In [9]:
test = pd.read_csv('data/Case2_Test Set Template.csv', 
                 sep = ';')

sales_data = pd.read_csv('./data/Case2_Sales data.csv', 
                 sep = ';')

market_data = pd.read_excel('./data/Case2_Market data.xlsx', 
                            index_col = 'Unnamed: 0')


# 2. Data Wrangling

<a href="#top">Top &#129033;</a>

- make necessarry changes to datasets to make them usable

## 2.1 Sales Data


In [10]:
sales_data.head(2)

Unnamed: 0,DATE,Mapped_GCK,Sales_EUR
0,01.10.2018,#1,0
1,02.10.2018,#1,0


**Exploring the dataset structure with .info() method**

The `.info()` method provides more information about the dataset, including the existence of missing values and the data type of the elements of each row.

In [11]:
sales_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9802 entries, 0 to 9801
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   DATE        9802 non-null   object
 1   Mapped_GCK  9802 non-null   object
 2   Sales_EUR   9802 non-null   object
dtypes: object(3)
memory usage: 229.9+ KB


With the information above, we can see the following:
- all of the columns in this dataset are categorical variables. For analysis purposes, the `DATE` column will be transformed into a date format, and the `Sales_EUR` column into a float. 
- we can also see that there might not be any missing values in any of the columns of the dataset.

Based on this, we will first change the data types of these 2 columns identified so that analysis can be proprerly made.

In [12]:
# convert to datetime
sales_data['DATE'] = pd.to_datetime(sales_data['DATE'], 
                                    format='%d.%m.%Y')

# convvert to numbers
sales_data['Sales_EUR'] = sales_data['Sales_EUR'].apply(lambda x: x.replace(',', '.')).astype(float)

sales_data.head(1)

Unnamed: 0,DATE,Mapped_GCK,Sales_EUR
0,2018-10-01,#1,0.0


**Convert to Monthly Data**

In [13]:
# convert to montlhy sales
sales_data["Date"] = sales_data["DATE"].dt.to_period("M")
monthly_sales = sales_data.groupby(["Date", 
                                     "Mapped_GCK"]).agg({"Sales_EUR": "sum"}).reset_index()

monthly_sales.head(7)

Unnamed: 0,Date,Mapped_GCK,Sales_EUR
0,2018-10,#1,36098918.79
1,2018-10,#11,1021303.5
2,2018-10,#12,28686.33
3,2018-10,#13,27666.1
4,2018-10,#14,5770.0
5,2018-10,#16,333196.87
6,2018-10,#20,4563.14


Make each Product categ a column

In [14]:
pivoted_data = monthly_sales.pivot(index='Date', 
                                 columns='Mapped_GCK', 
                                 values='Sales_EUR')

pivoted_data.head(3)

Mapped_GCK,#1,#11,#12,#13,#14,#16,#20,#3,#36,#4,#5,#6,#8,#9
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2018-10,36098918.79,1021303.5,28686.33,27666.1,5770.0,333196.87,4563.14,8089465.96,6474.6,397760.69,2499061.19,369231.6,586052.74,3219.32
2018-11,5140760.0,1898844.8,1070.0,68180.0,17130.0,1377694.32,5798.14,11863001.51,21617.61,371322.42,8993944.04,473046.96,526292.77,1875.9
2018-12,37889612.12,1226122.0,17880.6,15655.18,0.0,4762524.66,918.65,8736859.39,13924.52,430100.96,6947507.31,999472.69,271490.71,0.0


## 2.2 Market Data

<a href="#top">Top &#129033;</a>

In [15]:
market_data.head(3)

Unnamed: 0,China,China.1,France,France.1,Germany,Germany.1,Italy,Italy.1,Japan,Japan.1,Switzerland,Switzerland.1,United Kingdom,United Kingdom.1,United States,United States.1,Europe,Europe.1,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Producer Prices,Producer Prices.1,Producer Prices.2,Producer Prices.3,Producer Prices.4,Producer Prices.5,production index,production index.1,production index.2,production index.3,production index.4,production index.5,production index.6,production index.7,production index.8,production index.9,production index.10,production index.11,production index.12,production index.13,production index.14,production index.15
Index 2010=100 (if not otherwise noted),Production Index Machinery & Electricals,Shipments Index Machinery & Electricals,Production Index Machinery & Electricals,Shipments Index Machinery & Electricals,Production Index Machinery & Electricals,Shipments Index Machinery & Electricals,Production Index Machinery & Electricals,Shipments Index Machinery & Electricals,Production Index Machinery & Electricals,Shipments Index Machinery & Electricals,Production Index Machinery & Electricals,Shipments Index Machinery & Electricals,Production Index Machinery & Electricals,Shipments Index Machinery & Electricals,Production Index Machinery & Electricals,Shipments Index Machinery & Electricals,Production Index Machinery & Electricals,Shipments Index Machinery & Electricals,World: Price of Base Metals,World: Price of Energy,World: Price of Metals & Minerals,World: Price of Natural gas index,"World: Price of Crude oil, average",World: Price of Copper,United States: EUR in LCU,United States: Electrical equipment,United Kingdom: Electrical equipment,Italy: Electrical equipment,France: Electrical equipment,Germany: Electrical equipment,China: Electrical equipment,United States: Machinery and equipment n.e.c.,World: Machinery and equipment n.e.c.,Switzerland: Machinery and equipment n.e.c.,United Kingdom: Machinery and equipment n.e.c.,Italy: Machinery and equipment n.e.c.,Japan: Machinery and equipment n.e.c.,France: Machinery and equipment n.e.c.,Germany: Machinery and equipment n.e.c.,United States: Electrical equipment,World: Electrical equipment,Switzerland: Electrical equipment,United Kingdom: Electrical equipment,Italy: Electrical equipment,Japan: Electrical equipment,France: Electrical equipment,Germany: Electrical equipment
date,MAB_ELE_PRO156,MAB_ELE_SHP156,MAB_ELE_PRO250,MAB_ELE_SHP250,MAB_ELE_PRO276,MAB_ELE_SHP276,MAB_ELE_PRO380,MAB_ELE_SHP380,MAB_ELE_PRO392,MAB_ELE_SHP392,MAB_ELE_PRO756,MAB_ELE_SHP756,MAB_ELE_PRO826,MAB_ELE_SHP826,MAB_ELE_PRO840,MAB_ELE_SHP840,MAB_ELE_PRO1100,MAB_ELE_SHP1100,RohiBASEMET1000_org,RohiENERGY1000_org,RohiMETMIN1000_org,RohiNATGAS1000_org,RohCRUDE_PETRO1000_org,RohCOPPER1000_org,WKLWEUR840_org,PRI27840_org,PRI27826_org,PRI27380_org,PRI27250_org,PRI27276_org,PRI27156_org,PRO28840_org,PRO281000_org,PRO28756_org,PRO28826_org,PRO28380_org,PRO28392_org,PRO28250_org,PRO28276_org,PRO27840_org,PRO271000_org,PRO27756_org,PRO27826_org,PRO27380_org,PRO27392_org,PRO27250_org,PRO27276_org
2004m2,16.940704,16.940704,112.091273,83.458866,82.623037,79.452532,124.289603,86.560493,109.33401,110.495272,91.221862,89.987275,111.353812,73.601265,107.6014,79.24023,97.122911,80.09853,54.039811,44.123338,48.747945,87.076974,39.639458,36.623832,1.2646,78.969864,80.757423,93.020027,,93.230453,,102.491722,97.597374,97.1,106.191977,116.790276,110.890034,118.274109,80.82901,117.723991,,81.1,120.706516,141.510864,106.161262,102.077057,85.9132


In [16]:
info = market_data.T.iloc[:, :2]
info

Unnamed: 0,Index 2010=100 (if not otherwise noted),date
China,Production Index Machinery & Electricals,MAB_ELE_PRO156
China.1,Shipments Index Machinery & Electricals,MAB_ELE_SHP156
France,Production Index Machinery & Electricals,MAB_ELE_PRO250
France.1,Shipments Index Machinery & Electricals,MAB_ELE_SHP250
Germany,Production Index Machinery & Electricals,MAB_ELE_PRO276
Germany.1,Shipments Index Machinery & Electricals,MAB_ELE_SHP276
Italy,Production Index Machinery & Electricals,MAB_ELE_PRO380
Italy.1,Shipments Index Machinery & Electricals,MAB_ELE_SHP380
Japan,Production Index Machinery & Electricals,MAB_ELE_PRO392
Japan.1,Shipments Index Machinery & Electricals,MAB_ELE_SHP392


Rename Cols

In [17]:
market_data = market_data.rename(columns = u.rename_dict)

Remove unnecessary rows

In [18]:
market_data = market_data.reset_index()
market_data = market_data[2:]
market_data.rename(columns={'index': 'Date'}, inplace=True)
market_data

Unnamed: 0,Date,CHI Production Index,CHI Shipments Index,FRA Production Index,FRA Shipments Index,GER Production Index,GER Shipments Index,ITA Production Index,ITA Shipments Index,JAP Production Index,JAP Shipments Index,SWI Production Index,SWI Shipments Index,UK Production Index,UK Shipments Index,USA Production Index,USA Shipments Index,Europe Production Index,Europe Shipments Index,(W) Price of Base Metals,(W) Price of Energy,(W) Price of Metals & Minerals,(W) Price of Natural gas index,"(W) Price of Crude oil, average",(W) Price of Copper,USA EUR to LCU Conversion Rate,USA EE Production Prices,UK EE Production Prices,ITA EE Production Prices,FRA EE Production Prices,GER EE Production Prices,CHI EE Production Prices,USA Machinery & Equipment Index,(W) Machinery & Equipment Index,SWI Machinery & Equipment Index,UK Machinery & Equipment Index,ITA Machinery & Equipment Index,JAP Machinery & Equipment Index,FRA Machinery & Equipment Index,GER Machinery & Equipment Index,USA EE Production Index,(W) EE Production Index,SWI EE Production Index,UK EE Production Index,ITA EE Production Index,JAP EE Production Index,FRA EE Production Index,GER EE Production Index
2,2004m2,16.940704,16.940704,112.091273,83.458866,82.623037,79.452532,124.289603,86.560493,109.33401,110.495272,91.221862,89.987275,111.353812,73.601265,107.6014,79.24023,97.122911,80.09853,54.039811,44.123338,48.747945,87.076974,39.639458,36.623832,1.2646,78.969864,80.757423,93.020027,,93.230453,,102.491722,97.597374,97.1,106.191977,116.790276,110.890034,118.274109,80.82901,117.723991,,81.1,120.706516,141.510864,106.161262,102.077057,85.9132
3,2004m3,23.711852,23.711852,136.327976,106.168192,100.556582,97.012918,143.411662,106.344544,140.884616,144.686166,85.866287,79.883583,127.558608,84.047595,110.187364,98.619024,113.783904,96.015929,54.666162,47.588957,49.256157,87.192705,42.592034,39.931055,1.2262,79.673569,80.962135,93.540268,,93.335678,,105.62748,113.224892,91.195116,121.625075,139.288391,141.176853,148.121841,102.130104,119.220779,,76.690307,138.30955,152.880234,140.288741,117.225685,97.670815
4,2004m4,24.435235,24.435235,117.791806,92.007646,89.653203,84.932358,129.083828,95.579673,105.853579,102.655769,85.622508,79.740802,108.732297,73.026027,108.166564,89.774031,101.715199,85.167236,54.872715,47.779013,49.423751,91.379923,42.650637,39.134854,1.1985,80.337639,80.757423,93.852425,,93.440903,,103.484955,100.16909,93.793535,104.965505,125.289566,105.648765,125.482231,90.961426,117.441124,,71.552403,115.55733,137.796875,106.271197,105.335777,87.253983
5,2004m5,23.708115,23.708115,109.002541,85.696486,86.880571,82.372794,135.590391,100.087039,101.864777,100.305285,85.378729,79.598021,110.6452,74.591883,108.425887,87.463813,101.275727,84.485767,51.230356,53.590898,46.468392,99.04452,47.517121,36.278433,1.2007,80.798828,80.757423,93.852425,,93.546127,,103.643944,99.581436,96.391954,105.885359,131.988998,101.990361,116.64975,88.082901,117.899216,,66.4145,119.269534,143.860535,101.60871,96.616508,84.675552
6,2004m6,27.009138,27.009138,133.785737,106.641482,99.010814,95.10874,136.424935,110.889719,120.33292,119.61638,85.13495,79.455239,122.02096,82.343346,110.569933,97.364496,112.057197,96.963294,52.876331,50.799575,47.803913,98.636267,44.967605,35.65738,1.2138,80.91349,80.552711,93.956467,,93.440903,,106.062668,109.27771,98.990373,118.252278,132.988922,122.136575,143.248734,100.978699,119.499107,,61.276596,128.849416,144.315308,116.655248,118.45871,95.401802
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
216,2021m12,310.763183,310.763183,100.565744,134.589504,118.103281,149.364286,94.006826,150.482735,127.771735,131.029703,106.704029,104.819189,101.273544,,107.040766,148.590371,123.076659,150.046922,125.20703,112.372958,116.715183,236.488368,92.188708,126.76124,1.1304,128.511261,,113.309631,108.18251,115.748863,98.1062,105.736748,134.598755,102.27753,90.350055,103.191399,136.975506,112.791885,129.188248,109.624107,132.281006,114.326241,121.065762,72.915611,109.005151,80.763306,97.773956
217,2022m1,235.956129,235.956129,85.743503,108.15632,94.55061,120.353403,86.851008,101.258277,110.460181,110.823532,103.49926,101.70157,95.003541,,111.052133,129.565798,103.199827,120.338095,133.219393,121.309886,125.229641,196.91114,106.173052,129.829146,1.1314,131.62851,,115.390617,111.037476,117.853386,98.280171,110.894371,117.489883,100.305236,85.44417,92.292313,117.861377,90.558372,92.343117,111.36467,122.236023,108.999212,112.324119,74.355736,95.369065,77.944954,98.599052
218,2022m2,235.956129,235.956129,90.60354,117.71577,103.987916,129.383676,106.583758,120.956538,117.879631,118.300232,100.294492,98.583952,98.458412,,116.336327,138.56033,113.500635,131.500126,138.905572,131.273215,131.176501,197.523679,118.348203,131.963648,1.1342,133.342178,,116.431107,112.057098,118.905647,98.714158,117.168167,124.627762,98.332942,89.021378,113.290565,124.710859,97.766502,102.820961,114.6884,127.373421,103.672183,115.55733,91.182419,103.950687,79.001831,106.128059
219,2022m3,329.413367,329.413367,107.843548,136.85872,121.308119,151.201314,124.637966,153.645142,152.000561,156.400634,97.089723,95.466333,121.993915,,117.654038,165.926217,133.13301,158.055622,149.890871,163.186834,141.283339,271.079906,142.200872,135.782207,1.1019,136.153778,,117.471596,112.362991,119.852684,99.021554,118.910912,149.375229,96.360648,109.155949,134.288818,160.954233,114.72081,122.049515,115.164093,152.452942,98.345154,145.254965,102.475998,133.743932,96.704582,119.948433


Convert dates to datetime

In [19]:
# convert to datetime
market_data["Year"] = market_data["Date"].str.extract(r"(\d{4})").astype(int)
market_data["Month"] = market_data["Date"].str.extract(r"m(\d{1,2})").astype(int)

market_data["Date"] = pd.to_datetime(market_data["Year"].astype(str) + "-" + market_data["Month"].astype(str), format="%Y-%m").dt.to_period("M")

market_data.drop(['Year', 'Month'], axis = 1, inplace = True)

Convert numbers to float

In [20]:
for column in market_data.columns:
    if column in ['Date', 'Month', 'Year']:
        continue
    else:
        market_data[column] = market_data[column].astype(str).str.replace(',', '.', regex=False)
        market_data[column] = pd.to_numeric(market_data[column], errors='coerce')

In [21]:
market_data.set_index('Date', inplace=True)

In [22]:
market_data.head(4)

Unnamed: 0_level_0,CHI Production Index,CHI Shipments Index,FRA Production Index,FRA Shipments Index,GER Production Index,GER Shipments Index,ITA Production Index,ITA Shipments Index,JAP Production Index,JAP Shipments Index,SWI Production Index,SWI Shipments Index,UK Production Index,UK Shipments Index,USA Production Index,USA Shipments Index,Europe Production Index,Europe Shipments Index,(W) Price of Base Metals,(W) Price of Energy,(W) Price of Metals & Minerals,(W) Price of Natural gas index,"(W) Price of Crude oil, average",(W) Price of Copper,USA EUR to LCU Conversion Rate,USA EE Production Prices,UK EE Production Prices,ITA EE Production Prices,FRA EE Production Prices,GER EE Production Prices,CHI EE Production Prices,USA Machinery & Equipment Index,(W) Machinery & Equipment Index,SWI Machinery & Equipment Index,UK Machinery & Equipment Index,ITA Machinery & Equipment Index,JAP Machinery & Equipment Index,FRA Machinery & Equipment Index,GER Machinery & Equipment Index,USA EE Production Index,(W) EE Production Index,SWI EE Production Index,UK EE Production Index,ITA EE Production Index,JAP EE Production Index,FRA EE Production Index,GER EE Production Index
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1
2004-02,16.940704,16.940704,112.091273,83.458866,82.623037,79.452532,124.289603,86.560493,109.33401,110.495272,91.221862,89.987275,111.353812,73.601265,107.6014,79.24023,97.122911,80.09853,54.039811,44.123338,48.747945,87.076974,39.639458,36.623832,1.2646,78.969864,80.757423,93.020027,,93.230453,,102.491722,97.597374,97.1,106.191977,116.790276,110.890034,118.274109,80.82901,117.723991,,81.1,120.706516,141.510864,106.161262,102.077057,85.9132
2004-03,23.711852,23.711852,136.327976,106.168192,100.556582,97.012918,143.411662,106.344544,140.884616,144.686166,85.866287,79.883583,127.558608,84.047595,110.187364,98.619024,113.783904,96.015929,54.666162,47.588957,49.256157,87.192705,42.592034,39.931055,1.2262,79.673569,80.962135,93.540268,,93.335678,,105.62748,113.224892,91.195116,121.625075,139.288391,141.176853,148.121841,102.130104,119.220779,,76.690307,138.30955,152.880234,140.288741,117.225685,97.670815
2004-04,24.435235,24.435235,117.791806,92.007646,89.653203,84.932358,129.083828,95.579673,105.853579,102.655769,85.622508,79.740802,108.732297,73.026027,108.166564,89.774031,101.715199,85.167236,54.872715,47.779013,49.423751,91.379923,42.650637,39.134854,1.1985,80.337639,80.757423,93.852425,,93.440903,,103.484955,100.16909,93.793535,104.965505,125.289566,105.648765,125.482231,90.961426,117.441124,,71.552403,115.55733,137.796875,106.271197,105.335777,87.253983
2004-05,23.708115,23.708115,109.002541,85.696486,86.880571,82.372794,135.590391,100.087039,101.864777,100.305285,85.378729,79.598021,110.6452,74.591883,108.425887,87.463813,101.275727,84.485767,51.230356,53.590898,46.468392,99.04452,47.517121,36.278433,1.2007,80.798828,80.757423,93.852425,,93.546127,,103.643944,99.581436,96.391954,105.885359,131.988998,101.990361,116.64975,88.082901,117.899216,,66.4145,119.269534,143.860535,101.60871,96.616508,84.675552


# 3. Data Exploration

<a href="#top">Top &#129033;</a>

## 3.1 Initial Sales Data Exploration

<br>

**Statistics**

By using the `describe()` method we get different statistical measurements, such as mean and standard deviation, for the various features. We will include the parameter 'all' so both statistics for numerical and categorical data appear. 

In [24]:
pivoted_data.describe().T

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Mapped_GCK,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
#1,43.0,35911770.0,5905117.0,5140760.0,34408960.0,37323903.07,38888670.0,44483013.86
#11,43.0,1532589.0,981159.8,18200.0,903599.2,1226122.0,2161184.0,3891447.76
#12,43.0,192546.2,141016.8,1070.0,39235.21,172712.88,321795.0,445648.06
#13,43.0,23468.77,16140.16,2550.31,10466.47,20663.64,30966.41,68180.0
#14,43.0,11484.23,16116.17,-2851.45,2015.025,5951.85,15305.89,76161.44
#16,43.0,427701.8,782489.9,40360.08,137554.1,224501.9,328372.5,4762524.66
#20,43.0,1821.555,2307.9,0.0,330.81,842.93,2543.02,8485.6
#3,43.0,12405660.0,2577332.0,3804319.74,11059070.0,12317479.75,13775670.0,18686819.85
#36,43.0,23223.58,42651.28,674.0,6244.495,12377.77,21995.05,253519.04
#4,43.0,363423.4,155710.9,93226.32,279799.5,329430.96,428161.2,829442.33


## 3.2 Initial Market Data Exporation

In [None]:
market_data.info()

With the information above, we can see the following:
- `date` could be set as index, since there is no duplicated date
- all of the columns in this dataset are stored as categorical variables, except `date`. Therefore, some modifications regarding data types might happen, because these columns' values represent prices and would be more appropriate to store as floats.
- we can also see that there are missing values on `SWI_MC_EL_PROD`, `SWI_MC_EL_SHIP`, `UK_MC_EL_SHIP`, `USA_MC_EL_SHIP`, `UK_EE_PRODUCER_PRICE`, `FRA_EE_PRODUCER_PRICE`, `CHI_EE_PRODUCER_PRICE`, `CHI_MC_EQ_PROD`, `WRL_EE_PROD` and `SWI_EE_PROD`. However, these missing values might be due to the fact that not all countries have, for instance, producers.

We will proceed to change the datatypes of all columns, except `date`, to float and set `date` as index.

**Statistics**

In [26]:
market_data.describe()

Unnamed: 0,CHI Production Index,CHI Shipments Index,FRA Production Index,FRA Shipments Index,GER Production Index,GER Shipments Index,ITA Production Index,ITA Shipments Index,JAP Production Index,JAP Shipments Index,SWI Production Index,SWI Shipments Index,UK Production Index,UK Shipments Index,USA Production Index,USA Shipments Index,Europe Production Index,Europe Shipments Index,(W) Price of Base Metals,(W) Price of Energy,(W) Price of Metals & Minerals,(W) Price of Natural gas index,"(W) Price of Crude oil, average",(W) Price of Copper,USA EUR to LCU Conversion Rate,USA EE Production Prices,UK EE Production Prices,ITA EE Production Prices,FRA EE Production Prices,GER EE Production Prices,CHI EE Production Prices,USA Machinery & Equipment Index,(W) Machinery & Equipment Index,SWI Machinery & Equipment Index,UK Machinery & Equipment Index,ITA Machinery & Equipment Index,JAP Machinery & Equipment Index,FRA Machinery & Equipment Index,GER Machinery & Equipment Index,USA EE Production Index,(W) EE Production Index,SWI EE Production Index,UK EE Production Index,ITA EE Production Index,JAP EE Production Index,FRA EE Production Index,GER EE Production Index
count,219.0,219.0,219.0,219.0,219.0,219.0,219.0,219.0,219.0,219.0,218.0,218.0,219.0,201.0,219.0,218.0,219.0,219.0,219.0,219.0,219.0,219.0,219.0,219.0,219.0,219.0,201.0,219.0,184.0,219.0,196.0,219.0,219.0,218.0,219.0,219.0,219.0,219.0,219.0,219.0,208.0,218.0,219.0,219.0,219.0,219.0,219.0
mean,138.303637,138.303637,104.431918,105.316814,107.499126,114.898377,105.228363,105.735378,111.948146,112.670602,97.834543,94.784942,108.752949,95.957072,109.418255,114.160028,108.77999,110.551132,89.733341,92.558006,86.064857,103.367773,89.44652,86.344288,1.253503,102.185734,100.151243,102.164957,101.969436,103.173606,97.681389,108.668835,112.466417,99.717112,104.384938,109.486312,115.083174,109.11666,110.429205,110.902904,111.14707,94.592852,116.667327,95.450517,105.555794,96.844134,102.125494
std,78.883209,78.883209,18.918529,12.762209,11.861942,17.091571,23.509638,19.948183,15.489336,16.891947,8.241523,9.153899,12.096725,12.94617,7.89127,14.63338,11.839462,15.727859,20.810149,30.615367,21.410779,41.747371,30.736831,23.811521,0.12019,10.524079,10.549411,4.454948,2.646863,6.076401,3.235886,9.132414,12.411183,10.467987,13.222741,22.491001,17.408257,21.050102,14.351658,8.444573,12.321223,13.465043,13.205283,29.195353,12.592527,16.857775,9.959946
min,16.940704,16.940704,50.75668,64.420676,74.332913,71.787161,34.213427,45.19171,67.53194,64.372344,77.801503,74.639253,61.048022,57.462935,85.994448,79.24023,69.786633,71.158884,50.822012,31.63231,46.468392,33.992282,26.623391,35.65738,1.0543,78.969864,80.552711,93.020027,96.864647,93.230453,90.292319,83.197311,74.760971,77.952571,51.716208,33.797184,64.082432,54.416245,71.617737,91.535751,83.310173,56.832151,77.956292,34.487114,74.56552,44.829357,76.424583
25%,68.47774,68.47774,93.613505,97.452819,100.560897,103.149778,94.335162,95.985839,103.740049,103.453182,92.410183,88.22103,100.498418,86.199717,104.599964,104.596526,101.729625,98.588003,76.590541,69.928272,71.956926,72.594822,66.703971,72.38221,1.14395,97.338623,91.811668,99.054886,100.229416,98.702209,95.072836,102.844543,104.833572,92.979573,94.591602,99.741688,105.415525,96.598988,102.014969,106.617188,102.798376,90.283688,107.41443,79.168774,97.547726,84.770638,95.659645
50%,133.50769,133.50769,102.736556,106.012166,108.99229,117.428836,105.088474,107.695805,111.683015,112.597293,97.573131,94.376831,108.911029,97.868918,110.153555,115.713379,110.383833,111.954128,88.390354,86.284861,85.070217,100.285408,82.434999,88.986014,1.24,103.883636,102.251793,103.3209,102.268677,103.437393,97.853017,109.303719,113.616623,98.051186,105.374329,112.490631,115.092228,106.395935,111.341393,109.156631,110.284397,97.620173,116.994312,88.377968,104.366326,95.823845,103.549629
75%,198.473934,198.473934,114.090851,115.030479,115.735786,127.11222,117.031701,119.83636,121.402653,121.498141,103.877063,100.736473,118.001772,105.269576,114.585399,123.419976,115.832526,121.613878,104.378371,120.720356,100.795709,119.173439,116.413879,101.730879,1.33595,105.592884,109.928352,104.881638,103.084373,107.120308,99.859473,114.991486,120.587734,105.001174,114.061835,123.489712,125.680601,120.55838,120.20726,116.481281,117.551489,103.238771,125.975452,110.472462,112.485105,108.154121,108.861191
max,329.413367,329.413367,152.743402,136.85872,130.869962,151.297092,153.940791,153.645142,153.898678,159.495942,116.674608,115.321078,137.68259,126.338522,126.650773,165.926217,134.216163,158.055622,149.890871,173.483562,141.283339,271.079906,168.046416,135.782207,1.577,137.531616,116.581375,118.408043,113.280655,121.220627,104.549877,131.229752,149.375229,135.336934,133.378758,154.087173,160.954233,167.005081,147.265411,129.713318,152.452942,117.541371,154.834847,164.855988,141.977523,141.26973,121.495483


Based on this, we get the following:

1. **Production and Shipments in Machinery & Electricals**  
   - China’s production and shipment index show high variability, indicating for instance fluctuations in demand, supply chain issues, or policy impacts.
   - At some point in time, Production and Shipments was over 3 times higher than in 2010 (229% increase)
   China was producing 38% more than in 2010 and in terms of other countries it was always higher.
   - Switzerland has the lowest mean production and shipment index, compared to the other countries.
   - UK tends to produce more than export
   - Germany’s shipment index mean (114.9) is higher than its production mean (107.5), which could indicate a focus on exports or strong international demand.

3. **Material Prices**  
   - Gas prices (mean: 103.37, std: 41.74) show high volatility, likely driven by geopolitical factors and global supply-demand shocks.  
   - Copper and metal price have a lower mean (86.34) than other metals.
   - Oil prices have a high range (26.62–168.05), reflecting the impact of market cycles and demand shifts.  

4. **Producer Prices vs. Production Indices (Electrical Equipment)**  
   - Germany has the highest producer price mean (103.17), closely followed by USA (103.17) and Italy (102.16). This suggests that companies in these countries are paying more to produce goods.  
   - Switzerland’s mean producer price is lower than others (97.68).
   - On average, UK had produce more than other countries (116), followed by the USA (110).

5. **Machinery & Equipment Production Trends**  
   - Japan has the highest production mean (115.08), followed by Germany (110.90) and the U.S. (110.42), indicating a strong machinery sector production.  
   - Switzerland has the lowest production indexes (on average 99.7).

# 4. Feature Exploration

<a href="#top">Top &#129033;</a>

## 4.1 Univariate Analysis

**Sales Data**

**Market Data**

## 4.2 Bivariate Analysis

**Sales Data**

**Market Data**

# 5. Data Cleaning & Preprocessing

## 5.1 Duplicates

**Sales Data**

**Market Data**

## 5.2 Missing Values

**Sales Data**

**Market Data**

## 5.3 Feature Engineering

**Sales Data**

**Market Data**

## 5.4 Outliers

**Sales Data**

**Market Data**