#  Data Analysis / Predicitive Analysis
- toc: true
- categories: []
- type: ap
- week: 28

# What is Data Analysis
Data analysis is the process of examining, cleaning, transforming, and modeling data in order to extract useful information and draw conclusions.

# The process of data analysis 
Data collection: Gathering relevant data from various sources.

Data cleaning: Removing inconsistencies, errors, or missing values from the data.

Data transformation: Converting data into a format that is more suitable for analysis.

Data modeling: Using statistical or machine learning techniques to analyze the data.

Data visualization: Presenting the results of the analysis in a clear and concise manner using graphs, charts, or other visual aids.

# How we Might use Data Analysis in the real World

Business: In business, data analysis is used to understand customer behavior, identify trends, and make informed decisions. 

Healthcare: For Exampple, data analysis can be used to identify patients at high risk of developing a certain disease, allowing healthcare providers to intervene early and prevent the disease from progressing.

Education: In education, data analysis is used to improve student outcomes and inform policy decisions.

![photo]({{site.baseurl}}/images/data ana.png)

# Predictive analysis 
Predictive analysis is the process of using data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes based on historical data. It is a branch of advanced analytics

Predictive analysis involves extracting information from data sets and using it to predict patterns and future trends. It involves analyzing large amounts of data and identifying patterns and relationships that can be used to forecast future outcomes.

## Real world scenario for predictive analysis 

Imagine that you are a marketing manager for a clothing company, and you want to predict which products will be popular in the upcoming season. You can use predictive analysis to analyze past sales data and customer behavior to make predictions about which products are likely to be successful in the future.

To get started with predictive analysis, you will need to follow these basic steps:

Define the problem: In this case, the problem is predicting which products will be popular in the upcoming season.

Gather data: You will need to collect data on past sales, customer behavior, and other relevant factors.

Prepare the data: You will need to clean and transform the data to make it usable for analysis.

Choose a model: There are many different predictive models to choose from, such as linear regression, decision trees, and neural networks. You will need to choose the one that is best suited for your specific problem.

Train the model: You will need to use the historical data to train the model and adjust its parameters.

Evaluate the model: You will need to test the model using a separate set of data to see how accurate it is at making predictions.

Use the model: Once you are satisfied with the model's accuracy, you can use it to make predictions about which products will be popular in the upcoming season.

In conclusion, predictive analysis can be a powerful tool for making informed decisions in various industries, including marketing. By analyzing past data and making predictions about future events, you can gain a competitive edge and make more informed decisions.

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create hypothetical dataset
data = {'Product': ['Shirt', 'Jeans', 'Dress', 'Jacket', 'Skirt'],
        'Color': ['Blue', 'Black', 'Red', 'Green', 'Yellow'],
        'Size': ['S', 'M', 'L', 'XL', 'XXL'],
        'Price': [25, 30, 40, 50, 35],
        'Sales': [1000, 1200, 800, 600, 900]}
df = pd.DataFrame(data)

# One-hot encode categorical variables
encoder = OneHotEncoder()
X = encoder.fit_transform(df[['Color', 'Size']])
y = df['Sales']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on testing data
y_pred = model.predict(X_test)

# Evaluate model accuracy
mse = mean_squared_error(y_test, y_pred)
print(f"Mean squared error: {mse}")

# Make predictions on new data
new_data = {'Color': ['Blue', 'Black', 'Green'],
            'Size': ['M', 'L', 'XL']}
new_df = pd.DataFrame(new_data)
new_X = encoder.transform(new_df)
new_y_pred = model.predict(new_X)
print(new_y_pred)


Mean squared error: 140625.0
[912.5 812.5 600. ]


- This code will output the mean squared error and predicted sales for the new products. Note that you can modify the dataset and add new products to make different predictions.

# Hacks 

1. How can Numpy and Pandas be used to preprocess data for predictive analysis?
3. What machine learning algorithms can be used for predictive analysis, and how do they differ?
4. Can you discuss some real-world applications of predictive analysis in different industries?
7. Can you explain the role of feature engineering in predictive analysis, and how it can improve model accuracy?
8. How can machine learning models be deployed in real-time applications for predictive analysis?
9. Can you discuss some limitations of Numpy and Pandas, and when it might be necessary to use other data analysis tools?
10. How can predictive analysis be used to improve decision-making and optimize business processes?