# Algorithme de trading

Dans cette dernière partie du projet, nous récupérons les deux modèles entraînés ainsi que les articles de presse, les posts Reddit et le cours de l'action Apple pour conseiller à notre utilisateur d'acheter ou non une action Apple avant que le marché n'ouvre.

### Renseignez la date du jour

In [40]:
from datetime import datetime, timedelta


#On demande à l'utilisateur de saisir une date

user_date_input = input("Entrez la date d'aujourd'hui (format AAAA-MM-JJ): ")
current_date = datetime.strptime(user_date_input, "%Y-%m-%d")

## Récupération des articles de presse de Forbes des 48h précédentes

In [41]:
import requests

start_date = current_date - timedelta(days=2)
end_date = current_date
parameters = {
    'q': 'Apple',
    'domains': 'forbes.com',
    'from': start_date.strftime("%Y-%m-%d"),
    'to': end_date.strftime("%Y-%m-%d"),
    'sortBy': 'publishedAt',
    'apiKey': '1b20fb6f9b9d40f0b4e4ad6fe5d90755'  
}

url = 'https://newsapi.org/v2/everything'


response = requests.get(url, params=parameters)
articles = response.json().get('articles', [])  

for article in articles:
    title = article.get('title', 'Titre indisponible')  
    source_name = article.get('source', {}).get('name', 'Source indisponible') 
    url = article.get('url', 'URL indisponible') 
    print(f"Titre: {title}\nSource: {source_name}\nURL: {url}\n")

Titre: Top Italian Wines Of 2023
Source: Forbes
URL: https://www.forbes.com/sites/tomhyland/2023/12/27/top-italian-wines-of-2023/

Titre: Legendary’s Expanding MonsterVerse Is A Masterclass In Universe Building
Source: Forbes
URL: https://www.forbes.com/sites/robsalkowitz/2023/12/27/legendarys-expanding-monsterverse-is-a-masterclass-in-universe-building/

Titre: How The C-Suite Can Advance Mental Health And Increase Employee Happiness?
Source: Forbes
URL: https://www.forbes.com/sites/cindygordon/2023/12/27/how-the-c-suite-can-advance-mental-health-and-increase-employee-happiness/

Titre: Ban On Apple Watch Sales Paused By Appeals Court
Source: Forbes
URL: https://www.forbes.com/sites/zacharyfolk/2023/12/27/ban-on-apple-watch-sales-paused-by-appeals-court/

Titre: Apple Watch Series 9 And Ultra 2 Could Suddenly Be Back On Sale Amid News Of Redesign
Source: Forbes
URL: https://www.forbes.com/sites/davidphelan/2023/12/27/apple-watch-series-9-and-ultra-2-could-suddenly-be-back-on-sale-amid

### Nettoyage des articles

In [42]:
import csv
import pandas as pd

with open('articles_algo.csv', mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Titre', 'URL', 'Date de publication', 'Description','Source'])
    for article in articles:
        if "Apple" not in article['title']:
          continue
        writer.writerow([article['title'], article['url'], article['publishedAt'], article['description'], article['source']['name']])

df=pd.read_csv('articles_algo.csv')

urls = [url for url in df['URL']]
html_contents = []
compteur=0
df['Contenu']=[None] * len(df)

for url in urls:
    response = requests.get(url)
    if response.status_code == 200:
        html_contents.append(response.text)
        df['Contenu'].loc[compteur]=response.text
    else:
        print(f"Échec de récupération pour {url}")
    compteur+=1

df.head()

Unnamed: 0,Titre,URL,Date de publication,Description,Source,Contenu
0,Ban On Apple Watch Sales Paused By Appeals Court,https://www.forbes.com/sites/zacharyfolk/2023/...,2023-12-27T17:43:40Z,Apple won a temporary victory in their patent ...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel..."
1,Apple Watch Series 9 And Ultra 2 Could Suddenl...,https://www.forbes.com/sites/davidphelan/2023/...,2023-12-27T17:16:59Z,The ban on Apple selling the Apple Watch Serie...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel..."
2,Apple Vision Pro 2024 Release Date: Mass Shipm...,https://www.forbes.com/sites/davidphelan/2023/...,2023-12-27T10:00:33Z,The next new Apple product could be with us so...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel..."
3,Apple Appeals As Watch Series 9 And Ultra 2 Sa...,https://www.forbes.com/sites/andrewwilliams/20...,2023-12-26T17:45:03Z,Apple has launched an appeal to get its Watch ...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel..."
4,Apple Watch Import Ban Starts Today—Here’s Wha...,https://www.forbes.com/sites/jamesfarrell/2023...,2023-12-26T17:09:29Z,Imports and sales of Apple Watches with blood ...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel..."


In [43]:
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
import re
from bs4 import BeautifulSoup

def cleaning_text(text):
    #Passage du texte en miniscules
    text=text.lower()
    #Suppression des chiffres
    text=re.sub(r'\d+', '', text)
    #Suppression de la ponctuation et des symboles spéciaux
    text=re.sub(r'[^\w\s]', '', text)
    return text


stops = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()
def clean_stopwords(text):
    words = word_tokenize(text)
    cleaned_text=[word for word in words if word not in stops]
    return ' '.join(cleaned_text)

#Fonction adaptée à la structure des articles Forbes
def clean_html_1(text_html): 
    soup=BeautifulSoup(text_html, 'html.parser')
    title=soup.find_all('h1', class_=True)
    content=soup.find_all('p')
    united_content = ' '.join(el.get_text(strip=True) for el in title + content )
    return united_content


In [44]:
if len(df)>0:
    df['Cleaned_html_content'] = df.apply(lambda row: clean_html_1(row['Contenu']), axis=1)
    df['Content_cleaned']=df['Cleaned_html_content'].apply(cleaning_text)
    df['Content_cleaned_from_stopwords']=df['Content_cleaned'].apply(clean_stopwords)

### Analyse de sentiments avec un modèle pré-entraîné

In [45]:
from transformers import pipeline

classifier = pipeline('sentiment-analysis')

sentiments=[]
#Comme nous l'avions vu précédemment, il faut tronquer les textes pour les réduire à 512 tokens au risque d'obtenir un message d'erreur
for i in range(len(df)):
    text = df.iloc[i,8]
    text = text[:512]
    sentiment = classifier(text)[0]['label']  
    sentiments.append(sentiment)
df['predicted_sentiment'] = sentiments
df.head()



No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


Unnamed: 0,Titre,URL,Date de publication,Description,Source,Contenu,Cleaned_html_content,Content_cleaned,Content_cleaned_from_stopwords,predicted_sentiment
0,Ban On Apple Watch Sales Paused By Appeals Court,https://www.forbes.com/sites/zacharyfolk/2023/...,2023-12-27T17:43:40Z,Apple won a temporary victory in their patent ...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel...",Ban On Apple Watch Sales Paused By Appeals Cou...,ban on apple watch sales paused by appeals cou...,ban apple watch sales paused appeals court app...,NEGATIVE
1,Apple Watch Series 9 And Ultra 2 Could Suddenl...,https://www.forbes.com/sites/davidphelan/2023/...,2023-12-27T17:16:59Z,The ban on Apple selling the Apple Watch Serie...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel...",Apple Watch Series 9 And Ultra 2 Could Suddenl...,apple watch series and ultra could suddenly ...,apple watch series ultra could suddenly go bac...,NEGATIVE
2,Apple Vision Pro 2024 Release Date: Mass Shipm...,https://www.forbes.com/sites/davidphelan/2023/...,2023-12-27T10:00:33Z,The next new Apple product could be with us so...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel...",Apple Vision Pro 2024 Release Date: Mass Shipm...,apple vision pro release date mass shipments ...,apple vision pro release date mass shipments d...,NEGATIVE
3,Apple Appeals As Watch Series 9 And Ultra 2 Sa...,https://www.forbes.com/sites/andrewwilliams/20...,2023-12-26T17:45:03Z,Apple has launched an appeal to get its Watch ...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel...",Apple Appeals As Watch Series 9 And Ultra 2 Sa...,apple appeals as watch series and ultra sale...,apple appeals watch series ultra sales banned ...,NEGATIVE
4,Apple Watch Import Ban Starts Today—Here’s Wha...,https://www.forbes.com/sites/jamesfarrell/2023...,2023-12-26T17:09:29Z,Imports and sales of Apple Watches with blood ...,Forbes,"<!DOCTYPE html><html lang=""en""><head><link rel...",Apple Watch Import Ban Starts Today—Here’s Wha...,apple watch import ban starts todayheres what ...,apple watch import ban starts todayheres know ...,NEGATIVE


On récupère le pourcentage d'articles à connotation positive parmi les articles qu'on a récupérés.

In [46]:
rate_pos = (df['predicted_sentiment'] == 'POSITIVE').mean()

print(f"Taux de prédictions 'POSITIVE': {rate_pos:}")


Taux de prédictions 'POSITIVE': 0.0


## Récupération des posts Reddit

In [47]:
pip install praw

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [48]:
import praw
import pandas as pd
from datetime import datetime, timedelta

reddit = praw.Reddit(client_id='xIq0ALkJ0RWzM5ZLwwiQKA',
                     client_secret='DeHliktGK8nfhDXsJFiebqgeZKhHXQ',
                     user_agent='Matlpg')

start_date = current_date - timedelta(days=2)

subreddit = reddit.subreddit('apple')
top_posts = subreddit.new(limit=500)  # On récupère ainsi les 500 derniers posts Reddit sur Apple

posts_data = []
for post in top_posts:
    post_date = datetime.utcfromtimestamp(post.created_utc)
    if start_date <= post_date < current_date:
        text = post.selftext if post.selftext else "Text Not Available"
        post_data = {
            "Titre": post.title,
            "Auteur": str(post.author),
            "Texte": text,
            "Date": post_date,
            "url": post.url
        }
        posts_data.append(post_data)

df_reddit = pd.DataFrame(posts_data)


In [49]:
df_reddit['Cleaned_text']=df_reddit['Texte'].apply(cleaning_text)
df_reddit['Cleaned_text']=df_reddit['Cleaned_text'].apply(clean_stopwords)
df_reddit = df_reddit[~df_reddit['Texte'].str.contains("Text Not Available")]
df_reddit.head()

Unnamed: 0,Titre,Auteur,Texte,Date,url,Cleaned_text
4,"Daily Advice Thread - December 26, 2023",AutoModerator,Welcome to the Daily Advice Thread for /r/Appl...,2023-12-26 11:00:28,https://www.reddit.com/r/apple/comments/18r5ri...,welcome daily advice thread rapple thread used...
5,Spatial Cat videos for Vision Pro,WebAssemblyMan,Homemade spatial cat videos,2023-12-26 04:07:45,http://photorealityar.com/spatialcatvideos.html,homemade spatial cat videos
9,"Daily Advice Thread - December 25, 2023",AutoModerator,Welcome to the Daily Advice Thread for /r/Appl...,2023-12-25 11:00:41,https://www.reddit.com/r/apple/comments/18qgkw...,welcome daily advice thread rapple thread used...


In [50]:
def extract_submission_id(url):
    match = re.search(r'/comments/(\w+)/', url)
    return match.group(1) if match else None

def get_top_comments(url):
    submission_id = extract_submission_id(url)
    if not submission_id:
        print(f"Erreur pour '{url}'")
        return []
    submission = reddit.submission(id=submission_id)
    submission.comment_sort = 'top'
    submission.comments.replace_more(limit=0)
    
    top_comments = []
    for comment in submission.comments[:5]:  # Prendre les 5 premiers commentaires
        top_comments.append(comment.body)
    
    return top_comments


for index, row in df_reddit.iterrows():
    top_comments = get_top_comments(row['url'])
    for i, comment in enumerate(top_comments):
        df_reddit.loc[index, f'comment_{i+1}'] = comment

Erreur pour 'http://photorealityar.com/spatialcatvideos.html'


In [51]:
df_reddit.replace('nan', np.nan, inplace=True) #Il y avait un problème avec le formatage précédent des 'nan'
df_reddit = df_reddit.dropna(axis=0, subset=['comment_1', 'comment_2', 'comment_3', 'comment_4', 'comment_5'])
df_reddit.head()

Unnamed: 0,Titre,Auteur,Texte,Date,url,Cleaned_text,comment_1,comment_2,comment_3,comment_4,comment_5
4,"Daily Advice Thread - December 26, 2023",AutoModerator,Welcome to the Daily Advice Thread for /r/Appl...,2023-12-26 11:00:28,https://www.reddit.com/r/apple/comments/18r5ri...,welcome daily advice thread rapple thread used...,I'm helping my mom troubleshoot some payment i...,"I just got a new 14"" MacBook Pro for work. Wha...","I have an iPhone 15 Pro Max, I went to GameSto...","I own two iPhones (both are iPhone 11, if it's...",When is the new Magic Keyboard coming out and ...
9,"Daily Advice Thread - December 25, 2023",AutoModerator,Welcome to the Daily Advice Thread for /r/Appl...,2023-12-25 11:00:41,https://www.reddit.com/r/apple/comments/18qgkw...,welcome daily advice thread rapple thread used...,Does anyone know when the Airpods Pro were upd...,A month ago I realized that I needed to do som...,"as im sure others have, i got a new phone toda...","So I got an e-mail ""Your Apple ID information ...",Someone used my credit card to buy a $100 gift...


In [52]:
df_reddit['comment_1_clean'] = df_reddit['comment_1'].apply(cleaning_text)
df_reddit['comment_2_clean'] = df_reddit['comment_2'].apply(cleaning_text)
df_reddit['comment_3_clean'] = df_reddit['comment_3'].apply(cleaning_text)
df_reddit['comment_4_clean'] = df_reddit['comment_4'].apply(cleaning_text)
df_reddit['comment_5_clean'] = df_reddit['comment_5'].apply(cleaning_text)

In [53]:
model_transformers = pipeline("sentiment-analysis")
for index, row in df_reddit.iterrows():
    transformers_sentiments = []
    for col in ['comment_1_clean', 'comment_2_clean', 'comment_3_clean', 'comment_4_clean', 'comment_5_clean']:
        if pd.notnull(row[col]):
            transformers_result = model_transformers(row[col][:512])[0] #Il faut limiter la taille du texte à 512 caractères pour ce mmodèle comme vu précédemment
            transformers_sentiment = transformers_result['label'].lower()
            transformers_sentiments.append(transformers_sentiment)
transformers_sentiments


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


['negative', 'negative', 'negative', 'negative', 'negative']

In [54]:
pos=0
for el in transformers_sentiment:
    if el=="positive":
        pos+=1
pos_reddit=(pos/len(df_reddit))*100
pos_reddit

0.0

## Récupération des informations de Yahoo Finance 

On l'a déja vu dans le fichier dédié.

In [55]:
import matplotlib.pyplot as plt 
import numpy as np
import yfinance as yf 
import pandas as pd 
from datetime import datetime

APL = "AAPL"
data = yf.Ticker(APL) # Extraction avec la fonction yf.Ticker de yfinance
prix_rec = data.history(period = '1d', start = '2020-1-1', end = current_date)
prix_rec.to_csv('AAPLt.csv')
df=pd.read_csv("AAPLt.csv") 
df['Date'] = df['Date'].astype(str)
df['Date'] = df['Date'].str.slice(0, 10)
df['Date'] = pd.to_datetime(df['Date'])
df = df[['Date', 'Close']]
df

Unnamed: 0,Date,Close
0,2020-01-02,73.152649
1,2020-01-03,72.441452
2,2020-01-06,73.018684
3,2020-01-07,72.675278
4,2020-01-08,73.844353
...,...,...
998,2023-12-19,196.940002
999,2023-12-20,194.830002
1000,2023-12-21,194.679993
1001,2023-12-22,193.600006


### Prédiction de la valeur du lendemain 

On a vu dans la partie dédiée que le modèle le plus efficace était le naive forecast.
Cependant, par définition du modèle, la valeur du lendemain que le modèle va prédire est la valeur réelle d'aujourd'hui. Donc il ne sera pas possible de tirer de conclusion sur oui ou non il faut acheter l'action, car sa valeur prédite sera la même qu'aujourd'hui.

Pour résoudre ce problème, nous allons comparer la valeur prédite, donc la valeur à J+1, avec la valeur à J-1.
Si la valeur est supérieure, alors nous considérerons qu'il faut acheter. 
Sinon, il vaut mieux ne pas acheter.
Une autre solution proposée est d'utiliser quand même le modèle LTSM, bien qu'il soit moins précis que le modèle de naive forecast, mais qui ne nécessite pas de recourir à ce compromis.

La valeur prédite du lendemain est donc la valeur de l'action au jour d'input, c'est à dire la dernière valeur de fermeture du marché :

In [56]:
valeur_lendemain = df['Close'].iloc[-1]
valeur_lendemain

193.0500030517578

Comparons la a la valeur du jour J-1, qui est donc :

In [57]:
valeur_prec = df['Close'].iloc[-2]
valeur_prec

193.6000061035156

In [58]:
def acheter_ou_non():
    if valeur_lendemain > valeur_prec :
        return 'POSITIVE'
    else : 
        return 'NEGATIVE'
    
acheter_ou_non()

'NEGATIVE'

# Conclusion 

In [59]:
if rate_pos>0.5 and pos_reddit>50 and acheter_ou_non()=='POSITIVE':
    print("C'est le moment d'acheter")
else :
    print("Passez votre tour")


Passez votre tour
