# Data Analysis Workflow

## Notebook Title
# Brent Oil Prices Data Analysis Workflow

## Introduction

This notebook outlines the data analysis workflow for studying the relationship between Brent oil prices and major global events.
The objective is to provide data-driven insights on how political, economic, and social events impact Brent oil prices.


## Section 1: Data Analysis Workflow Steps

### Step 1: Data Loading and Preprocessing

- Load Brent Oil Prices data from the CSV file.
- Check for missing values, outliers, and data consistency.
- Convert date columns to datetime format and set up the data for time series analysis.


### Step 2: Exploratory Data Analysis (EDA)

- Visualize trends, seasonality, and possible correlations within the data.
- Identify key statistics (e.g., mean, median) and examine distribution of oil prices over time.


### Step 3: Model Selection and Initial Hypotheses

- Review time series models like ARIMA, GARCH, or Bayesian models.
- Define initial hypotheses on how events might correlate with oil price changes.


### Step 4: Assumptions and Limitations

- Note assumptions (e.g., stationarity of data for time series models).
- Identify limitations, such as external data dependencies or non-stationary events.


### Step 5: Communicating Results

- Determine the channels (e.g., report, blog post) and formats (e.g., visualizations, summary tables) for sharing insights.



## Import the nessesary Library

In [10]:
import os
print(os.getcwd())  # This prints the current working directory
os.chdir(r'c:\users\ermias.tadesse\10x\Oil-Price-Insights')  # Set the working directory to the project root

import pandas as pd
import matplotlib.pyplot as plt

c:\users\ermias.tadesse\10x\Oil-Price-Insights


## Initialize DataProcessor Instance

### Data Loading and Preprocessing
We initialize the `DataProcessor` class with the path to the Brent oil prices CSV file. This class handles data loading, parsing, and initial cleaning steps.

In [12]:
# Load the dataset
file_path = 'Data/Raw/BrentOilPrices.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset
data.head()

Unnamed: 0,Date,Price
0,20-May-87,18.63
1,21-May-87,18.45
2,22-May-87,18.55
3,25-May-87,18.6
4,26-May-87,18.63


### Inspect the dataset for missing values and data types

In [13]:
# Inspect the dataset for missing values and data types
data.info()
data.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9011 entries, 0 to 9010
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    9011 non-null   object 
 1   Price   9011 non-null   float64
dtypes: float64(1), object(1)
memory usage: 140.9+ KB


Date     0
Price    0
dtype: int64

### Data PreProcessing

In [14]:
# Convert the 'Date' column to datetime format
data['Date'] = pd.to_datetime(data['Date'], format='mixed')

# Sort the data by date
data = data.sort_values(by='Date')

# Reset index after sorting
data.reset_index(drop=True, inplace=True)