# Poem Classification using Deep Learning  

## 📌 Project Overview  

Poetry is a unique form of artistic expression, often encompassing deep emotions, vivid imagery, and rhythmic structures. This project aims to classify poems into one of four predefined genres using Natural Language Processing (NLP) and Deep Learning techniques.  

The four poem genres included in the dataset are:  
- **Affection** 💖 (Love, friendship, emotions)  
- **Environment** 🌿 (Nature, seasons, landscapes)  
- **Music** 🎵 (Melody, rhythm, musical themes)  
- **Death** ⚰️ (Loss, grief, mortality)  

We will build a deep learning model capable of understanding the thematic elements of poetry and classifying it into the appropriate genre.  

---

## 🎯 Project Objectives  

- **Text Preprocessing**: Cleaning poetry texts by removing special characters, stopwords, and performing tokenization.  
- **Feature Extraction**: Using word embeddings (Word2Vec, GloVe, or embeddings from Transformer models like BERT) to represent poems in a numerical format.  
- **Model Training**: Training a deep learning model (LSTM, BiLSTM, or Transformer-based models) to classify poems.  
- **Evaluation**: Assessing model performance using accuracy, F1-score, and confusion matrix.  
- **Deployment**: Integrating the trained model into a **Flask web application** for real-time poem classification.  
- **Model Tracking with MLflow**: Implementing **MLflow** to log experiment parameters, model performance, and facilitate reproducibility.  

---

## 📜 Dataset Details  

The dataset consists of poetry texts labeled into four categories: **Affection, Environment, Music, and Death**. Each poem undergoes preprocessing steps such as:  

- **Removing special characters & punctuation**  
- **Lowercasing text**  
- **Tokenization & stopword removal**  
- **Using word embeddings for numerical representation**  

---

## 🏗️ Model Training Workflow  

1. **Data Preprocessing**: Cleaning the text data and converting words into numerical embeddings.  
2. **Model Selection**: Experimenting with deep learning architectures like LSTM, BiLSTM, and Transformer-based models.  
3. **Training & Hyperparameter Tuning**: Optimizing model performance by fine-tuning hyperparameters.  
4. **Evaluation**: Comparing different models using evaluation metrics like accuracy, precision, recall, and F1-score.  
5. **Logging with MLflow**: Tracking model experiments, storing trained models, and comparing performance metrics.  
6. **Deployment**: Deploying the best-performing model as a **Flask web app** on **Render**.  

---

## 🛠️ Technologies Used  

- **Python (3.x)** 🐍  
- **TensorFlow/Keras** 🤖  
- **NLTK / SpaCy** (For text preprocessing)  
- **Word2Vec / GloVe / BERT** (For word embeddings)  
- **LSTM / BiLSTM / Transformers** (Deep Learning models)  
- **MLflow** (For experiment tracking)  
- **Flask** (For web application)  
- **Render** (For model deployment)  

---

## 🚀 Next Steps  

- Experiment with **BERT embeddings** for improved contextual understanding.  
- Compare performance with **CNN-based text classification models**.  
- Implement **attention mechanisms** to improve classification accuracy.  
- Deploy the model as an **interactive web app** for users to input poetry and get instant classification results.  

---

💡 **Let's get started with the Poem Classification project!** 🚀  


In [6]:
# Import necessary libraries

import numpy as np
import pandas as pd
import seaborn as sns
import missingno as msno
from matplotlib import pyplot as plt 
%matplotlib inline

# make with this report 
import pandas_profiling

# gnore all warnings
import warnings

# Ignore all warnings
warnings.filterwarnings("ignore")




# preprocessing imports 
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline


# Alogrithim 
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, HistGradientBoostingClassifier
from sklearn.svm import SVC, LinearSVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis


# Hyperpearmeter turning 
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier


## model accuary metrices 
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, precision_score, recall_score, f1_score


## dorp the model in pickle 
import pickle

## Tracking the Model 
import mlflow
import mlflow.sklearn

Load the data

In [None]:
# Load the data into a DataFrame
poem_df = pd.read_csv('D:\\Professional\\data\\Depression Professional Dataset.csv')