# Download the Stock Prices Data

### In this notebook we download the stock prices data that we will use in our analysis.

#### Set up environment and import required libraries

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import pandas as pd
import yfinance as yf
import time

import matplotlib.pyplot as plt
import warnings

plt.style.use("seaborn")
warnings.simplefilter(action="ignore", category=FutureWarning)

# STEP 1: Download the Stock Prices Data
<font size=4>We use the ***yfinance*** library to download the data</font>

In [2]:
assets = [
    "CAP.PA",
    "SAN.PA",
    "ENI.PA",
    "ENGI.PA",
    "AC.PA",
    "RACE.MI",
    "CAP.PA",
    "FCA.MI",
    "TIT.MI",
    "JUVE.PA",
    "SAN.PA",
    "RACE.MI",
    "ENGI.PA",
    "CAP.PA",
    "BNP.PA",
    "G.MI",
]

In [3]:
%%time
assets_prices = yf.download(
    assets, start="2018-01-01", end="2022-02-01", progress=False
)

print(f"\nData size: {assets_prices.shape}\n")
assets_prices.head(10)


3 Failed downloads:
- FCA.MI: No data found, symbol may be delisted
- ENI.PA: No data found, symbol may be delisted
- JUVE.PA: No data found, symbol may be delisted

Data size: (1047, 66)

CPU times: user 314 ms, sys: 82.6 ms, total: 396 ms
Wall time: 368 ms


Unnamed: 0_level_0,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,...,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Unnamed: 0_level_1,AC.PA,BNP.PA,CAP.PA,ENGI.PA,ENI.PA,FCA.MI,G.MI,JUVE.PA,RACE.MI,SAN.PA,...,BNP.PA,CAP.PA,ENGI.PA,ENI.PA,FCA.MI,G.MI,JUVE.PA,RACE.MI,SAN.PA,TIT.MI
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2018-01-02 00:00:00,41.274788,43.358845,92.012756,9.910755,,,11.18291,,84.322945,59.017155,...,2785984,374579,4784430,,,5300727.0,,511258.0,2239762,66072337.0
2018-01-03 00:00:00,41.113411,43.74292,93.871605,9.952543,,,11.08612,,85.771797,59.27211,...,3766640,518773,5666606,,,8118315.0,,427199.0,2166624,54839157.0
2018-01-04 00:00:00,41.388699,44.532028,94.196899,10.10925,,,11.168019,,89.345627,60.036961,...,4272372,462982,5665448,,,7907838.0,,1056023.0,2738123,59740478.0
2018-01-05 00:00:00,41.550079,44.615822,95.265739,10.164967,,,11.316926,,90.166634,61.15546,...,3125691,525044,5086093,,,4851123.0,,587577.0,2568891,58871069.0
2018-01-08 00:00:00,42.129143,45.041805,95.172798,10.231131,,,11.324371,,91.422302,61.147232,...,3030090,318211,4068532,,,4876545.0,,578255.0,1837618,126179734.0
2018-01-09 00:00:00,42.76516,45.691242,97.031631,10.224166,,,11.465833,,92.629684,61.056767,...,3474952,660337,6225347,,,6059050.0,,560611.0,2291517,51273854.0
2018-01-10 00:00:00,42.850597,46.389565,96.799294,10.203273,,,11.718975,,92.388214,60.316582,...,4849727,711395,7537815,,,9641025.0,,574892.0,2567570,72568195.0
2018-01-11 00:00:00,43.306248,46.787605,97.914589,10.217202,,,11.704085,,93.643875,59.954723,...,4473175,513773,5333467,,,5891548.0,,524446.0,2308330,70557590.0
2018-01-12 00:00:00,43.477123,47.108833,98.844009,10.140591,,,11.666858,,93.788757,60.587982,...,4306247,377957,5379458,,,7073587.0,,536735.0,2300339,61980877.0
2018-01-15 00:00:00,43.657486,47.157719,98.797546,10.088355,,,11.674303,,93.595589,60.201443,...,2109344,348022,4837624,,,3333390.0,,319614.0,1558363,59521551.0


Let's have a look at the columns.

In [4]:
assets_prices.columns

MultiIndex([('Adj Close',   'AC.PA'),
            ('Adj Close',  'BNP.PA'),
            ('Adj Close',  'CAP.PA'),
            ('Adj Close', 'ENGI.PA'),
            ('Adj Close',  'ENI.PA'),
            ('Adj Close',  'FCA.MI'),
            ('Adj Close',    'G.MI'),
            ('Adj Close', 'JUVE.PA'),
            ('Adj Close', 'RACE.MI'),
            ('Adj Close',  'SAN.PA'),
            ('Adj Close',  'TIT.MI'),
            (    'Close',   'AC.PA'),
            (    'Close',  'BNP.PA'),
            (    'Close',  'CAP.PA'),
            (    'Close', 'ENGI.PA'),
            (    'Close',  'ENI.PA'),
            (    'Close',  'FCA.MI'),
            (    'Close',    'G.MI'),
            (    'Close', 'JUVE.PA'),
            (    'Close', 'RACE.MI'),
            (    'Close',  'SAN.PA'),
            (    'Close',  'TIT.MI'),
            (     'High',   'AC.PA'),
            (     'High',  'BNP.PA'),
            (     'High',  'CAP.PA'),
            (     'High', 'ENGI.PA'),
            

We observe that the data present multi-indexed columns corresponding, for each asset, for each date, to different condition of the market:

    - Opening price
    - Closure price
    - Adjusted closure price
    - Highest peak
    - Lowest peak
    - Volume of the asset

# STEP 2: Quick Data Processing
<font size=4>Here, we decide to consider only the closure price for our analysis.</font>

In [5]:
close_cols = [c for c in assets_prices.columns if "Close" in c]
close_cols

[('Close', 'AC.PA'),
 ('Close', 'BNP.PA'),
 ('Close', 'CAP.PA'),
 ('Close', 'ENGI.PA'),
 ('Close', 'ENI.PA'),
 ('Close', 'FCA.MI'),
 ('Close', 'G.MI'),
 ('Close', 'JUVE.PA'),
 ('Close', 'RACE.MI'),
 ('Close', 'SAN.PA'),
 ('Close', 'TIT.MI')]

In [6]:
assets_prices_close = assets_prices[close_cols]

print(f"\nData size: {assets_prices_close.shape}\n")
assets_prices_close.head(10)


Data size: (1047, 11)



Unnamed: 0_level_0,Close,Close,Close,Close,Close,Close,Close,Close,Close,Close,Close
Unnamed: 0_level_1,AC.PA,BNP.PA,CAP.PA,ENGI.PA,ENI.PA,FCA.MI,G.MI,JUVE.PA,RACE.MI,SAN.PA,TIT.MI
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
2018-01-02 00:00:00,43.48,62.09,99.0,14.23,,,15.02,,87.300003,71.760002,0.7255
2018-01-03 00:00:00,43.310001,62.639999,101.0,14.29,,,14.89,,88.800003,72.07,0.725
2018-01-04 00:00:00,43.599998,63.77,101.349998,14.515,,,15.0,,92.5,73.0,0.734
2018-01-05 00:00:00,43.77,63.889999,102.5,14.595,,,15.2,,93.349998,74.360001,0.7385
2018-01-08 00:00:00,44.380001,64.5,102.400002,14.69,,,15.21,,94.650002,74.349998,0.7525
2018-01-09 00:00:00,45.049999,65.43,104.400002,14.68,,,15.4,,95.900002,74.239998,0.752
2018-01-10 00:00:00,45.139999,66.43,104.150002,14.65,,,15.74,,95.650002,73.339996,0.752
2018-01-11 00:00:00,45.619999,67.0,105.349998,14.67,,,15.72,,96.949997,72.900002,0.748
2018-01-12 00:00:00,45.799999,67.459999,106.349998,14.56,,,15.67,,97.099998,73.669998,0.7435
2018-01-15 00:00:00,45.990002,67.529999,106.300003,14.485,,,15.68,,96.900002,73.199997,0.7385


We get rid of the multi-index now that we have only the closure prices.

In [7]:
assets_prices_close = assets_prices_close.droplevel(0, axis=1)

print(f"\nData size: {assets_prices_close.shape}\n")
assets_prices_close.head(10)


Data size: (1047, 11)



Unnamed: 0_level_0,AC.PA,BNP.PA,CAP.PA,ENGI.PA,ENI.PA,FCA.MI,G.MI,JUVE.PA,RACE.MI,SAN.PA,TIT.MI
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2018-01-02 00:00:00,43.48,62.09,99.0,14.23,,,15.02,,87.300003,71.760002,0.7255
2018-01-03 00:00:00,43.310001,62.639999,101.0,14.29,,,14.89,,88.800003,72.07,0.725
2018-01-04 00:00:00,43.599998,63.77,101.349998,14.515,,,15.0,,92.5,73.0,0.734
2018-01-05 00:00:00,43.77,63.889999,102.5,14.595,,,15.2,,93.349998,74.360001,0.7385
2018-01-08 00:00:00,44.380001,64.5,102.400002,14.69,,,15.21,,94.650002,74.349998,0.7525
2018-01-09 00:00:00,45.049999,65.43,104.400002,14.68,,,15.4,,95.900002,74.239998,0.752
2018-01-10 00:00:00,45.139999,66.43,104.150002,14.65,,,15.74,,95.650002,73.339996,0.752
2018-01-11 00:00:00,45.619999,67.0,105.349998,14.67,,,15.72,,96.949997,72.900002,0.748
2018-01-12 00:00:00,45.799999,67.459999,106.349998,14.56,,,15.67,,97.099998,73.669998,0.7435
2018-01-15 00:00:00,45.990002,67.529999,106.300003,14.485,,,15.68,,96.900002,73.199997,0.7385


### Save the data

In [8]:
assets_prices_close.to_csv("../data/assets-prices.csv")