# Sentiment Analysis Using CRISP-DM Process

### 1. **Business Understanding**
The goal of this analysis is to classify tweets into three categories based on their sentiment: **Positive**, **Negative**, or **Neutral**.
We will use a dataset of tweets for sentiment analysis, following the CRISP-DM process: **Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment.**

### 2. **Data Understanding**
We'll begin by loading and exploring the dataset to understand its structure and content.

In [None]:
# Import necessary libraries
import pandas as pd

# Load the dataset
df = pd.read_csv('/content/Twitter_Data.csv')

# Display the first few rows of the dataset
df.head()

### 3. **Data Preparation**
Next, we will clean the text data. This involves removing unwanted characters, converting text to lowercase, and handling missing values if any.

In [None]:
# Clean the data
import re
import numpy as np

# Function to clean tweet text
def clean_text(text):
    text = re.sub(r'http\S+', '', text)  # Remove URLs
    text = re.sub(r'[^A-Za-z0-9\s]', '', text)  # Remove special characters
    text = text.lower()  # Convert text to lowercase
    return text

# Apply the cleaning function to the 'text' column
df['cleaned_text'] = df['text'].astype(str).apply(clean_text)

# Drop rows with missing values in the 'cleaned_text' column if any
df.dropna(subset=['cleaned_text'], inplace=True)

# Display the cleaned dataset
df[['cleaned_text']].head()

### 4. **Modeling**
We will use the `TextBlob` library to calculate the polarity of each tweet. Polarity scores range from -1 (most negative) to 1 (most positive). We'll classify the polarity into three categories: Positive, Negative, and Neutral.

In [None]:
# Import the TextBlob library for sentiment analysis
from textblob import TextBlob

# Function to classify sentiment
def get_sentiment(text):
    blob = TextBlob(text)
    polarity = blob.sentiment.polarity
    
    if polarity > 0:
        return 'Positive'
    elif polarity < 0:
        return 'Negative'
    else:
        return 'Neutral'

# Apply the function to classify sentiment for each tweet
df['sentiment'] = df['cleaned_text'].apply(get_sentiment)

# Display the first few rows with sentiment labels
df[['cleaned_text', 'sentiment']].head()

### 5. **Evaluation**
We will now check the distribution of sentiments in the dataset and evaluate how well the TextBlob approach works by manually inspecting a few results.

In [None]:
# Check the distribution of sentiments
sentiment_counts = df['sentiment'].value_counts()
sentiment_counts

### 6. **Deployment**
In a real-world scenario, this phase involves deploying the model to a production environment. For now, we'll focus on saving the results to a CSV file for further analysis.

In [None]:
# Save the sentiment analysis results to a CSV file
df.to_csv('/content/sentiment_analysis_results.csv', index=False)

# Display a success message
print("Sentiment analysis results saved to sentiment_analysis_results.csv")