# Goal

- Show daily average data on the paralel exchange rate of the Argentinian Peso and the US dollar
- Render basic financial indicators for this paralel exchange rate
- Using argentinian front page newspapers, predict the rise or fall of the paralel currency exchange

### Warning

The paralel exchange rate of ARSUSD is not the same that will appear in a google search for ARSUSD. This is the value used by daily transactions between third parties in Argentina. It is also called "dollar blue" or "informal dollar". From now on, I am going to call it "informal dollar" from now on

## Why am I doing this?

I think this is an interesting proyect for a few reasons

- This currency exchange rate is not available for most trading tools, because it is not the dollar exchange rate that companies or the government use. However, it is widely used to decide most prices of other products inside Argentina, and it shapes the daily lives of the argentinian people. It has been called "the stability thermometer of the country". 

- Also, from a purely ML/DS perspective, will it be even possible to predict the behaviour of this currency exchange rate based on newspaper front pages? Although I am not sure, I am willing to try and see.

First, lets capture the informal dollar prices from a known source

In [23]:
!pip3 install pandas
!pip3 install plotly
!pip3 install jupyter_dash



In [8]:
from dash import Dash, html, dcc
import plotly.express as px
from jupyter_dash import JupyterDash

In [9]:
import ast
import pandas as pd

In [10]:
#add to the cell before
import os
import json
import datetime
import time

In [26]:
!wget 'https://mercados.ambito.com//dolar/informal/historico-general/01-01-2000/27-02-2022' -O raw_dolar_data

--2022-03-06 16:34:00--  https://mercados.ambito.com//dolar/informal/historico-general/01-01-2000/27-02-2022
Resolving mercados.ambito.com (mercados.ambito.com)... 151.139.128.11
Connecting to mercados.ambito.com (mercados.ambito.com)|151.139.128.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 150482 (147K) [application/json]
Saving to: ‘raw_dolar_data’


2022-03-06 16:34:01 (285 KB/s) - ‘raw_dolar_data’ saved [150482/150482]



In [27]:
dolar_price_raw_file = open('raw_dolar_data', 'r')

In [28]:
raw = dolar_price_raw_file.read()



In [29]:
dolar_prices = ast.literal_eval(raw)

In [30]:
dolar_prices.pop(0)

['Fecha', 'Compra', 'Venta']

In [31]:
dollar_df = pd.DataFrame(dolar_prices, columns=['date','buy_price', 'sell_price'])

In [32]:
dollar_df['date'] = pd.to_datetime(dollar_df['date'], format="%d-%m-%Y")

In [33]:
dollar_df['buy_price']= [float(str(i).replace(',','.')) for i in dollar_df['buy_price']]
dollar_df['sell_price']= [float(str(i).replace(',','.')) for i in dollar_df['sell_price']]

In [34]:
dollar_df

Unnamed: 0,date,buy_price,sell_price
0,2022-02-25,207.00,211.00
1,2022-02-24,206.50,210.50
2,2022-02-23,206.00,210.00
3,2022-02-22,206.00,210.00
4,2022-02-21,207.50,211.50
...,...,...,...
5010,2002-01-17,1.92,1.97
5011,2002-01-16,1.83,1.87
5012,2002-01-15,1.90,1.95
5013,2002-01-14,1.63,1.68


In [35]:
dollar_df.dtypes

date          datetime64[ns]
buy_price            float64
sell_price           float64
dtype: object

In [36]:
dollar_df['avg'] = (dollar_df['buy_price']+dollar_df['sell_price'])/2

In [37]:
dollar_df

Unnamed: 0,date,buy_price,sell_price,avg
0,2022-02-25,207.00,211.00,209.000
1,2022-02-24,206.50,210.50,208.500
2,2022-02-23,206.00,210.00,208.000
3,2022-02-22,206.00,210.00,208.000
4,2022-02-21,207.50,211.50,209.500
...,...,...,...,...
5010,2002-01-17,1.92,1.97,1.945
5011,2002-01-16,1.83,1.87,1.850
5012,2002-01-15,1.90,1.95,1.925
5013,2002-01-14,1.63,1.68,1.655


## Data exploration

We will be using the Dash library to generate an environment where Plotly graphs will be rendered. 

In [38]:
app = JupyterDash(__name__)

In [46]:
fig = px.line(dollar_df, x='date', y='avg',labels=dict(date='Date', avg='ARSUSD informal rate'))

In [47]:

app.layout = html.Div([
    dcc.Graph(figure=fig),
])

In [48]:
app.run_server(mode="inline")


The 'environ['werkzeug.server.shutdown']' function is deprecated and will be removed in Werkzeug 2.1.



# Argentinian Newspaper front pages

## Extraction

I will be using a newspaper that offers a service to see the front page for every day. As we don't want to overuse a free service, I will keep a file that tracks the last downloaded picture, so that we everytime that this notebook is run, only the new pages will be downloaded

In [11]:
from PIL import Image
import requests
from io import BytesIO

In [16]:
today = datetime.datetime.today()
last_record = {
    'last_record':'2000/01/02'
}
not_available_list = open('not_available_list','w+')
# first we check if the file exists
if os.path.exists('./records.json'):
    open('records.json', 'r')
else:
    file = open('records.json', 'w+')
    #in this situation, we have to donwload them all.
    # let's start in 2000-01-01
    date_cursor = datetime.datetime(2000,1,2)
    date_add = datetime.timedelta(days=1)
    
    print(date_cursor)
    a = 0
    while date_cursor < datetime.datetime.today() and a < 10:
        a+=1
        date_string = date_cursor.strftime('%Y/%m/%d/%Y%m%d')
        request_uri = f'https://tapas.clarin.com/tapa/{date_string}_thumb.jpg'
        print(request_uri)
        r = requests.get(request_uri)
        if r.status_code == requests.codes.ok:
            i = Image.open(BytesIO(r.content))
            filename = date_cursor.strftime('%Y%m%d')
            i.save(f'./{filename}.jpg')
            last_record['last_record'] = filename
        else:
            print('Not available')
        date_cursor = date_cursor + date_add
    json_last_record = json.dumps(last_record)
    file.write(json_last_record)
    file.close()

2000-01-02 00:00:00
https://tapas.clarin.com/tapa/2000/01/02/20000102_thumb.jpg
https://tapas.clarin.com/tapa/2000/01/03/20000103_thumb.jpg
https://tapas.clarin.com/tapa/2000/01/04/20000104_thumb.jpg
https://tapas.clarin.com/tapa/2000/01/05/20000105_thumb.jpg
https://tapas.clarin.com/tapa/2000/01/06/20000106_thumb.jpg
https://tapas.clarin.com/tapa/2000/01/07/20000107_thumb.jpg
https://tapas.clarin.com/tapa/2000/01/08/20000108_thumb.jpg
https://tapas.clarin.com/tapa/2000/01/09/20000109_thumb.jpg
https://tapas.clarin.com/tapa/2000/01/10/20000110_thumb.jpg
https://tapas.clarin.com/tapa/2000/01/11/20000111_thumb.jpg


# Transformation

There are many possible approaches here. The most naive of them all, the simplest, and, in my opinion, the one with the highest failure probability, is inputing the pictures directly into a Neural Network and trying to predict the behaviour of the informal dollar. 

I think this would be a bit tough for most models. So, my plan is to transform the pictures into text first. For this, I will be using RSLA to detect the text sections and extract them, and then, pytesseract to transform the images of text into actual text.  

## Why am I not using archives of digital newspapers?

Well, I am interested to see if just the data of a front page is enough to predict the price movements. Newspapers are not just text. They have specific layouts. Articles, titles, images, are given a certain size. Different colors are used. I will try to take all this into account when feeding the data into the NN. 

The transformation step will be quite large. Let's start