*1_Problem_Statement.ipynb (this file)*

## Problem statement (AI-Software Overall Hypothesis)

**By analyzing a combination of supply chain dynamics, shipping times, carriers, supplier locations, production volumes, routes, and shipped product features, and providing supply chain teams with a chatbot for easy real-time tracking of shipments, along with an integrated tool to stay up-to-date with relevant news from various webpages, we can predict transportation costs and enhance decision-making towards supply chain expense management while keeping operations aligned with the latest industry standards and insights.**

## Data Sources Overview and Feature Definition

In this section, we will define and detail the data sources utilized for each specific feature of the AI-driven software solution tailored for supply chain optimization.

1.- **Supply Chain Public Dataset (Kaggle)**

We are leveraging a dataset specifically tailored for supply chain analysis, available on [Kaggle](https://www.kaggle.com/datasets/harshsingh2209/supply-chain-analysis/download?datasetVersionNumber=1). This dataset provides a solid foundation for developing our predictive pricing model. The dataset's close alignment with the client's actual data makes it an ideal choice for this stage. As we progress, we plan to conduct further tests and integrate ETL (Extract, Transform, Load) processes with the client's databases, contingent upon obtaining the required data access permissions.

With this data, we plan to implement a feature that focuses on predictive price modeling. This feature will identify cargo and shipments likely to incur higher costs and establish an alert system. Accessible directly through our software, this system will inform users about which shipments require closer attention or allow them to select a specific shipment to predict its price.

2.- **Client's Operations DDBB (AWS RDS - MySQL Instance)**

...
With this data, we plan to integrate a feature for real-time shipment tracking accesible from instant messaging applications for convenience, etc. ...
...


3.- **Real-Time Supply Chain and Logistic News (Web Scrapping)**

We are aggregating news from three primary sources in the air and maritime transport sectors. This initiative enables us to gather critical information for decision-making processes and enhances our understanding of current industry dynamics.

- [Air Cargo News](https://www.aircargonews.net/)
- [Maritime Logistics Professional](https://www.maritimelogisticsprofessional.com)
- [Seatrade Maritime](https://www.seatrade-maritime.com/)

The primary data extracted from these websites include news titles, text and a link to the full article, categorized into 'Global News' and 'LATAM News' to cater to user preferences.

With this data, we are developing a reporting feature that transforms the responsiveness and strategic decision-making capabilities of supply chain managers. The tool leverages NLP and AI to provide predictive insights into disruptions in maritime and air logistics. Users can interact with it in several ways:

- Choose the mode of transport: '🚢 Maritime' or '✈️ Air'
- Select the geographical focus: '🌐 Global' or '🌎 South America'
- Specify the type of news: 'Regulation', 'Issues', 'Supply Chain', etc.
- Set the number of news articles to display, from 1 to 10

**Hugging Face Model Integration:**
Using a Hugging Face BERT model, the feture categorizes news and offers three daily recommendations ('Risks', 'Opportunities', and 'General').

**Premium Option:**
The premium layer of the feature provides:
- Translated summaries of news articles in Spanish
- Concise key point summaries for a business audience, limited to 350 characters
- In-depth analysis of the article's impact on maritime logistics, port operations, and supply chain management, specifically focusing on Latin America, labeled 'Impacto en LATAM', limited to 500 characters.


*2_data_wrangling.ipynb*

## Stages:

1. Handle missing values
2. Define categorical features
3. Perform feature engineering
4. List insights for the Exploratory Data Analysis
5. Define the data transformations needed

## Output:

Dataset prepared for EDA

*3_EDA.ipynb*

## Stages:

1. Data Wrangling Dataset Ingestion
2. Analyze categorical and numerical features
3. Select features based on their correlations
4. Select features and the target variable
5. Examine the distribution of numerical features
6. Select features based on their correlations
7. Re-define steps in data wrangling stages (if applicable)
8. Clean the dataset for modeling

## Output:

Dataset for modeling


*4_modeling.ipynb*

## Stages:

1. EDA Dataset Ingestion
2. Choose Model Type
3. Train/Test Phase
4. Save Intermediate Datasets
5. Model Evaluation Metrics
6. Try Different ML Models
7. Pick a Useful Metric
8. Condense Models and Metrics
9. Visualization of Performance Plots
10. Saving the Model

## Output:

.pkl file model for later usage in pipelines and platform integration 

*Deploment*

Predictive feature integration with software platform.