# Milestone 2: Weeks 3-4
### Module 1: 
#### Global Data Monitoring & Analysis Engine
### Objective: 
#### Build a system that monitors global data sources for potential supply chain risks while analyzing sentiment across platforms.
### Tasks:
#### ● Implement LLMs (OpenAI GPT, Meta LLaMA) to analyze news articles, supplier information, transportation updates for risk factors.
#### ● Aggregate global supply chain data into a structured format for analysis.

## COLLECTING DATA FOR SPECIFIC PRODUCT : DIESEL

In [24]:
import requests
import pandas as pd
from datetime import datetime, timedelta
from dotenv import load_dotenv
import os
# Get the current date
today = datetime.today()
# Calculate the earliest allowed date (30 days ago for free plan)
earliest_date = today - timedelta(days=30)

def configure():
    load_dotenv()
configure()
# Adjust start_date if it's too old
api_key = os.getenv("news_api_key")
url = "https://newsapi.org/v2/everything"

# Define search parameters
keywords = [
    'diesel production OR fuel delays ',
    'diesel prices OR fuel prices ',
    'diesel shortages OR fuel shortages ',
    'diesel stockpiling OR fuel stockpiling ',
    'diesel supply chain disruptions OR fuel supply chain disruptions ',
    'diesel supply chain issues OR fuel supply chain issues ',
]
start_date = "2024-01-01"
start_date = max(datetime.strptime(start_date, "%Y-%m-%d"), earliest_date).strftime("%Y-%m-%d")

end_date = "2024-12-01"
 # Maximum allowed value
articles_list = []

# Loop through each keyword and collect articles
for keyword in keywords:
    print(f"Fetching articles for keyword: {keyword}")
    for page in range(1, 6):  # Assuming a maximum of 5 pages per keyword
        params = {
            "q": keyword,
            "from": start_date,
            "to": end_date,
            "sortBy": "relevancy",
            "language": "en",
            "page": page,
            "apiKey": api_key
        }
        response = requests.get(url, params=params)
        
        if response.status_code == 200:
            data = response.json()
            articles = data.get("articles", [])
            if not articles:  # Break loop if no articles are found
                break
            articles_list.extend(articles)
        else:
            print(f"Error fetching data for {keyword} on page {page}: {response.status_code} - {response.text}")
            break

# Create a DataFrame and save results
if articles_list:
    diesel_df = pd.DataFrame(articles_list)
    # Keep only the relevant columns
    diesel_df = diesel_df[["title", "description", "publishedAt"]]
    diesel_df.drop_duplicates(subset=["title"], inplace=True)
    diesel_df.to_csv("diesel_news_articles.csv", index=False)
    print(f"Collected {len(diesel_df)} unique articles. Data saved to 'diesel_news_articles.csv'.")
else:
    print("No articles found.")

Fetching articles for keyword: diesel production OR fuel delays 
Fetching articles for keyword: diesel prices OR fuel prices 
Fetching articles for keyword: diesel shortages OR fuel shortages 
Fetching articles for keyword: diesel stockpiling OR fuel stockpiling 
Fetching articles for keyword: diesel supply chain disruptions OR fuel supply chain disruptions 
Fetching articles for keyword: diesel supply chain issues OR fuel supply chain issues 
Collected 218 unique articles. Data saved to 'diesel_news_articles.csv'.


In [25]:
diesel_df.description[0]


'Recently I published a roundup of quite awful hydrogen maritime trial efforts, and ended with a request that if others knew of more, they should share. A few additional ones bubbled up and there were a couple of minor corrections and delightful additions. I e…'

## Cleaning Data on Diesel news articles


In [26]:



import re

def clean_text(text):
    if text:
        text = re.sub(r"[\r\n\t]+", " ", text)
        text = text.lower()
        text = re.sub(r"[^\w\s]", "", text) 
        text = re.sub(r"\s+", " ", text).strip()
 
        return text.strip()
    return ""
diesel_df["title"] = diesel_df["title"].apply(clean_text)
diesel_df["description"] = diesel_df["description"].apply(clean_text)
# df["content"] = df["content"].apply(clean_text)

diesel_df.to_csv("cleaned_diesel_news_articles.csv", index=False)
print("Diesel Data cleaned and saved.")



Diesel Data cleaned and saved.


## Sentiment Analysis for Diesel news articles


In [27]:
from transformers import pipeline
import pandas as pd
diesel_df = pd.read_csv("cleaned_diesel_news_articles.csv")

# Load model and tokenizer
sentiment_analyzer = pipeline("sentiment-analysis",
                              model="distilbert-base-uncased-finetuned-sst-2-english",device=-1)
diesel_df['description'] = diesel_df['description'].fillna("").astype(str)

diesel_df['sentiment'] = diesel_df['description'].apply(lambda x: sentiment_analyzer(x)[0]['label'])

diesel_df


Device set to use cpu


Unnamed: 0,title,description,publishedAt,sentiment
0,more hydrogen maritime trials surface from sar...,recently i published a roundup of quite awful ...,2024-12-14T22:15:35Z,NEGATIVE
1,hydrogen ships are seeing same pattern as all ...,as the years pass and more reenactments of the...,2024-12-13T03:37:54Z,POSITIVE
2,vin diesel blames delay of the next fast and f...,as you know the next chapter in the fast and f...,2024-12-02T16:55:00Z,NEGATIVE
3,five ways to persuade more people to buy elect...,demand for electric cars is lower than expecte...,2024-12-01T00:20:31Z,NEGATIVE
4,used peugeot 3008 20162024 review,peugeots awardwinning crossover can be had for...,2024-12-02T12:00:00Z,POSITIVE
...,...,...,...,...
213,latest global industrial refrigeration equipme...,220 pages latest report according to a market ...,2024-12-05T06:30:00Z,NEGATIVE
214,moselle lock closure how to mitigate supply ch...,as a result of the moselle river accident this...,2024-12-16T05:00:00Z,NEGATIVE
215,fmi report key trends in energyefficient train...,the us train battery market is projected to re...,2024-12-03T14:30:00Z,POSITIVE
216,sustainable data center industry outlook forec...,cloud data center tech giants such as google m...,2024-12-19T09:30:00Z,POSITIVE


In [28]:
diesel_df['publishedAt'] = pd.to_datetime(diesel_df['publishedAt']).dt.strftime('%Y-%m-%d')


In [29]:
diesel_df.head()

Unnamed: 0,title,description,publishedAt,sentiment
0,more hydrogen maritime trials surface from sar...,recently i published a roundup of quite awful ...,2024-12-14,NEGATIVE
1,hydrogen ships are seeing same pattern as all ...,as the years pass and more reenactments of the...,2024-12-13,POSITIVE
2,vin diesel blames delay of the next fast and f...,as you know the next chapter in the fast and f...,2024-12-02,NEGATIVE
3,five ways to persuade more people to buy elect...,demand for electric cars is lower than expecte...,2024-12-01,NEGATIVE
4,used peugeot 3008 20162024 review,peugeots awardwinning crossover can be had for...,2024-12-02,POSITIVE


In [30]:
diesel_df['Month'] = pd.to_datetime(diesel_df['publishedAt']).dt.month_name()


In [31]:
diesel_df["Month"].value_counts

<bound method IndexOpsMixin.value_counts of 0      December
1      December
2      December
3      December
4      December
         ...   
213    December
214    December
215    December
216    December
217    December
Name: Month, Length: 218, dtype: object>

## Risk analysis using Groq API


In [32]:
import os
import pandas as pd
from groq import Groq
configure()
client = Groq(api_key=os.getenv("groq_api_key"))
# def analyze_risk(description):
#     # Prepare the message to be sent to the model
#     prompt = f"Identify the  risks on  diesel/fuel supply chain disruptions  described in the following text. Provide the 'reason for the risk'.'\nDescription: {description}\n"

#     # Make an API request to Groq to analyze the description
#     chat_completion = client.chat.completions.create(
#         messages=[{
#             "role": "user",
#             "content": prompt,
#         }],
#         model="llama-3.3-70b-versatile",
#     )


#     # Get the response content
#     response = chat_completion.choices[0].message.content.strip()
#     return response


## Adding risk analysis response to the dataframe


In [33]:


def analyze_risk(description):
    role="Consumer"
    """
    Analyze risks specific to the consumer role in the diesel supply chain.
    
    Parameters:
    - description: str - Description of the situation or scenario.
    - role: str - User's role in the supply chain (default: Consumer).
    
    Returns:
    - dict: Containing risk analysis and suggested actions.
    """
    # Prepare the message for risk analysis
    prompt = f"""
    I am a {role} in the diesel supply chain. Based on the following description, analyze the risks specific to my role:
    {description}
    Provide the output in this format:
    - Risks: <List risks specific to the role on diesel>
    - Suggested Actions: <Steps to mitigate these risks>
    """
    
    # Make the API request to analyze the risks
    chat_completion = client.chat.completions.create(
        messages=[{"role": "user", "content": prompt}],
        model="llama-3.3-70b-versatile",
    )
    
    # Extract risk analysis from the response
    risk_analysis = chat_completion.choices[0].message.content
    
    return risk_analysis.strip()

diesel_df["risk_analysis"] = diesel_df["description"].apply(analyze_risk)

diesel_df


Unnamed: 0,title,description,publishedAt,sentiment,Month,risk_analysis
0,more hydrogen maritime trials surface from sar...,recently i published a roundup of quite awful ...,2024-12-14,NEGATIVE,December,"Based on the provided description, it appears ..."
1,hydrogen ships are seeing same pattern as all ...,as the years pass and more reenactments of the...,2024-12-13,POSITIVE,December,- Risks: As a consumer in the diesel supply ch...
2,vin diesel blames delay of the next fast and f...,as you know the next chapter in the fast and f...,2024-12-02,NEGATIVE,December,"- Risks: None, as the provided description doe..."
3,five ways to persuade more people to buy elect...,demand for electric cars is lower than expecte...,2024-12-01,NEGATIVE,December,- Risks: \n 1. Decreased demand for diesel ...
4,used peugeot 3008 20162024 review,peugeots awardwinning crossover can be had for...,2024-12-02,POSITIVE,December,"Based on the provided description, there is no..."
...,...,...,...,...,...,...
213,latest global industrial refrigeration equipme...,220 pages latest report according to a market ...,2024-12-05,NEGATIVE,December,"Based on the provided description, it appears ..."
214,moselle lock closure how to mitigate supply ch...,as a result of the moselle river accident this...,2024-12-16,NEGATIVE,December,- Risks: \n * Potential price increase of die...
215,fmi report key trends in energyefficient train...,the us train battery market is projected to re...,2024-12-03,POSITIVE,December,"Based on the description provided, as a Consum..."
216,sustainable data center industry outlook forec...,cloud data center tech giants such as google m...,2024-12-19,POSITIVE,December,"Based on the provided description, here's an a..."


## SENTIMENT ANALYSIS WITH DESCRIPTION


In [38]:
from transformers import pipeline

# Load model and tokenizer
sentiment_analyzer = pipeline("sentiment-analysis",
                              model="distilbert-base-uncased-finetuned-sst-2-english",device=-1)
                              
diesel_df['sentiment'] = diesel_df['description'].apply(lambda x: sentiment_analyzer(x)[0]['label'])
print(diesel_df["sentiment"].value_counts())
diesel_df['sentiment_encoded'] = diesel_df['sentiment'].map({'POSITIVE': 1, 'NEGATIVE': -1})
diesel_df['Risk'] = diesel_df['sentiment'].map({'POSITIVE': "Low", 'NEGATIVE': "High"})


diesel_df

Device set to use cpu


sentiment
NEGATIVE    145
POSITIVE     73
Name: count, dtype: int64


Unnamed: 0,title,description,publishedAt,sentiment,Month,risk_analysis,sentiment_encoded,Risk
0,more hydrogen maritime trials surface from sar...,recently i published a roundup of quite awful ...,2024-12-14,NEGATIVE,December,"Based on the provided description, it appears ...",-1,High
1,hydrogen ships are seeing same pattern as all ...,as the years pass and more reenactments of the...,2024-12-13,POSITIVE,December,- Risks: As a consumer in the diesel supply ch...,1,Low
2,vin diesel blames delay of the next fast and f...,as you know the next chapter in the fast and f...,2024-12-02,NEGATIVE,December,"- Risks: None, as the provided description doe...",-1,High
3,five ways to persuade more people to buy elect...,demand for electric cars is lower than expecte...,2024-12-01,NEGATIVE,December,- Risks: \n 1. Decreased demand for diesel ...,-1,High
4,used peugeot 3008 20162024 review,peugeots awardwinning crossover can be had for...,2024-12-02,POSITIVE,December,"Based on the provided description, there is no...",1,Low
...,...,...,...,...,...,...,...,...
213,latest global industrial refrigeration equipme...,220 pages latest report according to a market ...,2024-12-05,NEGATIVE,December,"Based on the provided description, it appears ...",-1,High
214,moselle lock closure how to mitigate supply ch...,as a result of the moselle river accident this...,2024-12-16,NEGATIVE,December,- Risks: \n * Potential price increase of die...,-1,High
215,fmi report key trends in energyefficient train...,the us train battery market is projected to re...,2024-12-03,POSITIVE,December,"Based on the description provided, as a Consum...",1,Low
216,sustainable data center industry outlook forec...,cloud data center tech giants such as google m...,2024-12-19,POSITIVE,December,"Based on the provided description, here's an a...",1,Low


In [40]:
diesel_df.to_csv("DieselRiskAnalysis.csv")

# MILESTONE 2 COMPLETED