# ML Pipeline Preparation
Follow the instructions below to help you create your ML pipeline.
### 1. Import libraries and load data from database.
- Import Python libraries
- Load dataset from database with [`read_sql_table`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html)
- Define feature and target variables X and Y

In [1]:
!pip install --upgrade scikit-learn

Collecting scikit-learn
[?25l  Downloading https://files.pythonhosted.org/packages/f5/ef/bcd79e8d59250d6e8478eb1290dc6e05be42b3be8a86e3954146adbc171a/scikit_learn-0.24.2-cp36-cp36m-manylinux1_x86_64.whl (20.0MB)
[K    100% |████████████████████████████████| 20.0MB 1.1MB/s eta 0:00:01
Collecting threadpoolctl>=2.0.0 (from scikit-learn)
  Downloading https://files.pythonhosted.org/packages/61/cf/6e354304bcb9c6413c4e02a747b600061c21d38ba51e7e544ac7bc66aecc/threadpoolctl-3.1.0-py3-none-any.whl
Collecting numpy>=1.13.3 (from scikit-learn)
[?25l  Downloading https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl (13.4MB)
[K    100% |████████████████████████████████| 13.4MB 2.6MB/s eta 0:00:01
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
[?25hInstalling collected packages: threadpoolctl, numpy, scikit-learn
  Found existing installation: nu

In [10]:
!pip install nltk



In [12]:
# import libraries
import re
import sklearn
import numpy as np
import pandas as pd
import nltk

nltk.download(['punkt', 'stopwords', 'wordnet'])
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from sklearn.naive_bayes import MultinomialNB
#from utils import tokenize
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.model_selection import GridSearchCV, train_test_split, RandomizedSearchCV
from sklearn.multioutput import MultiOutputClassifier
from sklearn.preprocessing import FunctionTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sqlalchemy import create_engine
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import TfidfVectorizer  


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [13]:
print(sklearn.__version__)

0.24.2


In [14]:
# load data from database
engine = create_engine('sqlite:///DisasterPrediction.db')
df = pd.read_sql('SELECT * FROM disaster_project', engine)

In [6]:
df.head()

Unnamed: 0,id,message,original,genre,related,request,offer,aid_related,medical_help,medical_products,...,aid_centers,other_infrastructure,weather_related,floods,storm,fire,earthquake,cold,other_weather,direct_report
0,2,Weather update - a cold front from Cuba that c...,Un front froid se retrouve sur Cuba ce matin. ...,direct,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,7,Is the Hurricane over or is it not over,Cyclone nan fini osinon li pa fini,direct,1,0,0,1,0,0,...,0,0,1,0,1,0,0,0,0,0
2,8,Looking for someone but no name,"Patnm, di Maryani relem pou li banm nouvel li ...",direct,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,9,UN reports Leogane 80-90 destroyed. Only Hospi...,UN reports Leogane 80-90 destroyed. Only Hospi...,direct,1,1,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
4,12,"says: west side of Haiti, rest of the country ...",facade ouest d Haiti et le reste du pays aujou...,direct,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [7]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26216 entries, 0 to 26215
Data columns (total 40 columns):
id                        26216 non-null int64
message                   26216 non-null object
original                  10170 non-null object
genre                     26216 non-null object
related                   26216 non-null int64
request                   26216 non-null int64
offer                     26216 non-null int64
aid_related               26216 non-null int64
medical_help              26216 non-null int64
medical_products          26216 non-null int64
search_and_rescue         26216 non-null int64
security                  26216 non-null int64
military                  26216 non-null int64
child_alone               26216 non-null int64
water                     26216 non-null int64
food                      26216 non-null int64
shelter                   26216 non-null int64
clothing                  26216 non-null int64
money                     26216 non-null i

In [8]:
# define feature and target variables X and Y
X = df['message'].values

print(type(X))
Y = df.iloc[:, 4:].values

print(Y)

<class 'numpy.ndarray'>
[[1 0 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 ...
 [1 0 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]]


2. Write a tokenization function to process your text data

In [None]:
def tokenize(text):
    """
    Tokenize the input text by converting to lowercase, removing punctuation, 
    tokenizing the text, and removing stopwords.

    Parameters:
    text (str): The input text to tokenize.

    Returns:
    list: A list of tokens with punctuation removed and stopwords filtered out.
    """
    # convert the input text to lowercase
    text = text.lower()

    # clear the punctuation
    text = re.sub(r"[^0-9a-zA-Z]", " ", text)

    # tokenize text
    words = word_tokenize(text)

    # Initialize the lemmatizer
    lemmatizer = WordNetLemmatizer()

    # Apply lemmatization to each word (token) in the tokenized list, ignoring stopwords
    tokens = [lemmatizer.lemmatize(w) for w in words if w not in stopwords.words("english")]

    return tokens


Tokenize function is written in a seperate file 'tokenize.py' because of the errors encountered in pickle function.

### 3. Build a machine learning pipeline
This machine pipeline should take in the `message` column as input and output classification results on the other 36 categories in the dataset. You may find the [MultiOutputClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html) helpful for predicting multiple target variables.

In [9]:
# Build the pipeline
pipeline_lr = Pipeline([
    ('features', FeatureUnion([
        ('text_pipeline', Pipeline([
            ('vect', CountVectorizer(tokenizer=tokenize)),   # Tokenize the message column
            ('tfidf', TfidfTransformer())                    # Transform into TF-IDF
        ]))
    ])),
    ('clf', MultiOutputClassifier(LogisticRegression(max_iter=1000)))  # Apply Logistic Regression
])
 

### 4. Train pipeline
- Split data into train and test sets
- Train pipeline

In [10]:
# split data into training and test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

In [16]:
# train the model
pipeline_lr.fit(X_train, Y_train)

# make predictions
Y_pred = pipeline_lr.predict(X_test)

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0

In [11]:
# Check if there are columns with only one class in the target variables
for i, column in enumerate(Y_train.T):
    unique_values = set(column)
    if len(unique_values) == 1:
        print(f"Column {i} has only one class: {unique_values}")

Column 9 has only one class: {0}


In [12]:
# Remove columns with only one class from Y_train and Y_test
columns_to_drop = [i for i, column in enumerate(Y_train.T) if len(set(column)) == 1]

In [13]:
# Remove columns with only one class from Y_train and Y_test

Y_train_filtered = np.delete(Y_train, columns_to_drop, axis=1)
Y_test_filtered = np.delete(Y_test, columns_to_drop, axis=1)


In [14]:
# train the model
pipeline_lr.fit(X_train, Y_train_filtered)

# make predictions
Y_pred_filtered = pipeline_lr.predict(X_test)


In [15]:
print(Y_pred_filtered)

[[1 1 0 ... 0 0 1]
 [1 0 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 ...
 [1 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]


[[1 1 0 ... 0 0 1]
 [1 0 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 ...
 [1 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]

### 5. Test your model
Report the f1 score, precision and recall for each output category of the dataset. You can do this by iterating through the columns and calling sklearn's `classification_report` on each.

In [22]:
# Initialize lists to store metrics for each category
f1_scores = []
precisions = []
recalls = []

# Iterate through each column of the output categories
for i, column in enumerate(Y_test_filtered.T):
    print(f"Category {i}:")
    
    # Generate the classification report
    report = classification_report(Y_test_filtered[:, i], Y_pred_filtered[:, i], zero_division=0)
    print(report)

    # Optionally, extract F1, precision, and recall scores from the report
    # Parse the classification report into a dictionary
    report_dict = classification_report(Y_test_filtered[:, i], Y_pred_filtered[:, i], output_dict=True, zero_division=0)
    
    # Append the 'weighted avg' scores to the lists
    f1_scores.append(report_dict['weighted avg']['f1-score'])
    precisions.append(report_dict['weighted avg']['precision'])
    recalls.append(report_dict['weighted avg']['recall'])

# Print out the average metrics across all categories
avg_f1 = sum(f1_scores) / len(f1_scores)
avg_precision = sum(precisions) / len(precisions)
avg_recall = sum(recalls) / len(recalls)

print(f"Overall F1 Score: {avg_f1:.4f}")
print(f"Overall Precision: {avg_precision:.4f}")
print(f"Overall Recall: {avg_recall:.4f}")



Category 0:
              precision    recall  f1-score   support

           0       0.71      0.45      0.55      1873
           1       0.84      0.94      0.89      5934
           2       0.00      0.00      0.00        58

    accuracy                           0.82      7865
   macro avg       0.52      0.47      0.48      7865
weighted avg       0.80      0.82      0.80      7865

Category 1:
              precision    recall  f1-score   support

           0       0.91      0.98      0.94      6533
           1       0.83      0.53      0.65      1332

    accuracy                           0.90      7865
   macro avg       0.87      0.75      0.80      7865
weighted avg       0.90      0.90      0.89      7865

Category 2:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      7829
           1       0.00      0.00      0.00        36

    accuracy                           1.00      7865
   macro avg       0.50      0.50     

The overall performance of the model achieved an F1 score of 0.9363, with an overall precision of 0.9389 and an overall recall of 0.9471. These metrics provide a good balance between precision and recall across all categories, reflecting the model's general effectiveness in the multi-class classification task.

The overall performance of the model achieved an F1 score of 0.9363, with an overall precision of 0.9389 and an overall recall of 0.9471. These metrics provide a good balance between precision and recall across all categories, reflecting the model's general effectiveness in the multi-class classification task.

### 6. Improve your model
Use grid search to find better parameters. 

In [15]:

# Define the parameter grid for grid search
parameters_lr = {
    'clf__estimator__C': [0.1, 1, 10],  # Regularization strength
    'clf__estimator__max_iter': [1000, 2000]  # Ensure at least 1000 iterations for convergence
}

# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline_lr, param_grid=parameters_lr, verbose=2, n_jobs=1, cv=3)

# Fit the grid search to the data
grid_search.fit(X_train, Y_train_filtered)

# Display the best parameters found by GridSearchCV
print("Best Parameters:", grid_search.best_params_)


Fitting 3 folds for each of 6 candidates, totalling 18 fits
[CV] END clf__estimator__C=0.1, clf__estimator__max_iter=1000; total time= 1.9min
[CV] END clf__estimator__C=0.1, clf__estimator__max_iter=1000; total time= 1.9min
[CV] END clf__estimator__C=0.1, clf__estimator__max_iter=1000; total time= 2.0min
[CV] END clf__estimator__C=0.1, clf__estimator__max_iter=2000; total time= 1.9min
[CV] END clf__estimator__C=0.1, clf__estimator__max_iter=2000; total time= 1.9min
[CV] END clf__estimator__C=0.1, clf__estimator__max_iter=2000; total time= 1.9min
[CV] END .clf__estimator__C=1, clf__estimator__max_iter=1000; total time= 3.0min
[CV] END .clf__estimator__C=1, clf__estimator__max_iter=1000; total time= 3.1min
[CV] END .clf__estimator__C=1, clf__estimator__max_iter=1000; total time= 3.4min
[CV] END .clf__estimator__C=1, clf__estimator__max_iter=2000; total time= 3.1min
[CV] END .clf__estimator__C=1, clf__estimator__max_iter=2000; total time= 3.1min
[CV] END .clf__estimator__C=1, clf__estimat

Best Parameters: {'clf__estimator__C': 1, 'clf__estimator__max_iter': 1000}

### 7. Test your model
Show the accuracy, precision, and recall of the tuned model.  

Since this project focuses on code quality, process, and  pipelines, there is no minimum performance metric needed to pass. However, make sure to fine tune your models for accuracy, precision and recall to make your project stand out - especially for your portfolio!

In [16]:
# Retrain the model using the best parameters from GridSearchCV
best_pipeline_lr = grid_search.best_estimator_

# Fit the model with the entire training set
best_pipeline_lr.fit(X_train, Y_train_filtered)

# Make predictions on the test set
Y_pred_filtered = best_pipeline_lr.predict(X_test)

# Initialize lists to store precision, recall, and f1 scores for each category
lr_precisions = []
lr_recalls = []
lr_f1s = []

# Now, test the model using classification_report and other metrics
for i, column in enumerate(Y_test_filtered.T):
    print(f"Category {i}:")

    # Generate the classification report for each category
    report = classification_report(Y_test_filtered[:, i], Y_pred_filtered[:, i], output_dict=True, zero_division=0)
    
    # Print detailed classification report for each category
    print(classification_report(Y_test_filtered[:, i], Y_pred_filtered[:, i], zero_division=0))

    # Append 'weighted avg' precision, recall, and f1-score to the lists
    lr_precisions.append(report['weighted avg']['precision'])
    lr_recalls.append(report['weighted avg']['recall'])
    lr_f1s.append(report['weighted avg']['f1-score'])

# Calculate overall average metrics for Logistic Regression
avg_lr_precision = sum(lr_precisions) / len(lr_precisions)
avg_lr_recall = sum(lr_recalls) / len(lr_recalls)
avg_lr_f1 = sum(lr_f1s) / len(lr_f1s)

# Print overall metrics
print(f"Overall Logistic Regression Precision: {avg_lr_precision:.4f}")
print(f"Overall Logistic Regression Recall: {avg_lr_recall:.4f}")
print(f"Overall Logistic Regression F1-Score: {avg_lr_f1:.4f}")


Category 0:
              precision    recall  f1-score   support

           0       0.71      0.45      0.55      1873
           1       0.84      0.94      0.89      5934
           2       0.00      0.00      0.00        58

    accuracy                           0.82      7865
   macro avg       0.52      0.47      0.48      7865
weighted avg       0.80      0.82      0.80      7865

Category 1:
              precision    recall  f1-score   support

           0       0.91      0.98      0.94      6533
           1       0.83      0.53      0.65      1332

    accuracy                           0.90      7865
   macro avg       0.87      0.75      0.80      7865
weighted avg       0.90      0.90      0.89      7865

Category 2:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      7829
           1       0.00      0.00      0.00        36

    accuracy                           1.00      7865
   macro avg       0.50      0.50     

The overall performance of the Logistic Regression model is summarized as follows: it achieved a precision of 93.89%, a recall of 94.71%, and an F1-Score of 93.63%.

The machine learning model for categorizing disaster response messages performed well overall, with high accuracy and precision across most categories.

Overall Accuracy: 94.41%
Overall Precision: 93.29%
Overall Recall: 94.41%
Key Categories:
Related: Precision 84%, Recall 90%, F1-score 87%
Request: Precision 81%, Recall 43%, F1-score 57%
Aid Related: Precision 73%, Recall 62%, F1-score 67%
Notable Observations:
Strong Performance: Categories like 'related', 'aid_related', and 'weather_related' have high precision and recall, indicating the model can effectively classify these common categories.
Weak Performance: Categories such as 'medical_help' and 'direct_report' had low recall (9% and 34%, respectively), suggesting the model struggles with these less frequent or more complex categories.
Imbalance Impact: Categories with very few samples, such as 'offer' and 'tools', had ill-defined precision and F1-scores, signaling the model's difficulty in predicting these rare categories.

### 8. Try improving your model further. Here are a few ideas:
* try other machine learning algorithms
* add other features besides the TF-IDF

# Random Forest

In [12]:

# Pipeline for processing the message (text) with TF-IDF and Random Forest
pipeline_rf = Pipeline([
    ('text_pipeline', Pipeline([
        ('vect', CountVectorizer(tokenizer=tokenize)),  # Tokenize the message column
        ('tfidf', TfidfTransformer())                  # Transform into TF-IDF
    ])),
    ('clf', MultiOutputClassifier(RandomForestClassifier()))  # Apply Random Forest for multi-output classification
])

# Train the Random Forest model
pipeline_rf.fit(X_train, Y_train_filtered)

# Make predictions on the test set
Y_pred_filtered = pipeline_rf.predict(X_test)

# Evaluate the model using classification_report for each category
for i, column in enumerate(Y_test_filtered.T):
    print(f"Category {i}:")
    print(classification_report(Y_test_filtered[:, i], Y_pred_filtered[:, i], zero_division=0))

# Calculate overall precision, recall, and F1-score across all categories
rf_precisions = []
rf_recalls = []
rf_f1s = []

# Iterate through each category
for i, column in enumerate(Y_test_filtered.T):
    report = classification_report(Y_test_filtered[:, i], Y_pred_filtered[:, i], output_dict=True, zero_division=0)
    rf_precisions.append(report['weighted avg']['precision'])
    rf_recalls.append(report['weighted avg']['recall'])
    rf_f1s.append(report['weighted avg']['f1-score'])

# Calculate overall average metrics for Random Forest
avg_rf_precision = sum(rf_precisions) / len(rf_precisions)
avg_rf_recall = sum(rf_recalls) / len(rf_recalls)
avg_rf_f1 = sum(rf_f1s) / len(rf_f1s)

# Print overall metrics
print(f"Overall Random Forest Precision: {avg_rf_precision:.4f}")
print(f"Overall Random Forest Recall: {avg_rf_recall:.4f}")
print(f"Overall Random Forest F1-Score: {avg_rf_f1:.4f}")


Category 0:
              precision    recall  f1-score   support

           0       0.70      0.41      0.51      1873
           1       0.83      0.94      0.88      5934
           2       0.37      0.29      0.33        58

    accuracy                           0.81      7865
   macro avg       0.63      0.55      0.57      7865
weighted avg       0.79      0.81      0.79      7865

Category 1:
              precision    recall  f1-score   support

           0       0.91      0.98      0.94      6533
           1       0.84      0.50      0.62      1332

    accuracy                           0.90      7865
   macro avg       0.87      0.74      0.78      7865
weighted avg       0.89      0.90      0.89      7865

Category 2:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      7829
           1       0.00      0.00      0.00        36

    accuracy                           1.00      7865
   macro avg       0.50      0.50     

The Random Forest model performed well, achieving an overall precision of 93.61%, a recall of 94.63%, and an F1-score of 93.41%. These metrics indicate that the model is highly effective at correctly identifying relevant instances, with a good balance between precision and recall.

# Evaluate and Compare Models

When comparing the performance of the Random Forest and Logistic Regression models, both demonstrate strong classification capabilities. The Random Forest model achieved a precision of 93.61%, a recall of 94.63%, and an F1-score of 93.41%. Meanwhile, the Logistic Regression model performed slightly better, with a precision of 93.89%, a recall of 94.71%, and an F1-score of 93.63%. While the differences in performance metrics are minimal, Logistic Regression edges out Random Forest in terms of overall precision, recall, and F1-score, suggesting that it might be more effective at correctly identifying positive instances in this particular case. Both models are highly comparable and perform similarly well.However, the Logistic Regression model slightly outperforms the Random Forest model, achieving a precision of 93.89%, a recall of 94.71%, and an F1-score of 93.63%. In comparison, the Random Forest model has a precision of 93.61%, a recall of 94.63%, and an F1-score of 93.41%. While both models perform well, Logistic Regression shows marginally better results in all three metrics, indicating it may be the more effective model for this particular task.

### 9. Export your model as a pickle file

In [57]:
import pickle

# Define the pipeline with the custom tokenize function and Logistic Regression model with best parameters
pipeline_lr = Pipeline([
    ('vect', TfidfVectorizer(tokenizer=tokenize)),  # Use custom tokenize function
    ('clf', MultiOutputClassifier(LogisticRegression(C=1, max_iter=1000)))  # Apply Logistic Regression
])


# Fit the model on your training data
pipeline_lr.fit(X_train, Y_train_filtered)



# Save the Logistic Regression pipeline model to a pickle file
with open('logistic_regression_pipeline.pkl', 'wb') as file:
    pickle.dump(pipeline_lr, file)


In [16]:
import joblib

# Define the pipeline with the custom tokenize function and Logistic Regression model with best parameters
pipeline_lr = Pipeline([
    ('vect', TfidfVectorizer(tokenizer=tokenize, token_pattern=None)),  # Use custom tokenize function
    ('clf', MultiOutputClassifier(LogisticRegression(C=1, max_iter=1000)))  # Apply Logistic Regression
])

# Fit the model on your training data
pipeline_lr.fit(X_train, Y_train_filtered)

# Save the Logistic Regression pipeline model to a file
joblib.dump(pipeline_lr, 'logistic_regression_pipeline.pkl', compress=1)


['logistic_regression_pipeline.pkl']

### 10. Use this notebook to complete `train_classifier.py`
Use the template file attached in the Resources folder to write a script that runs the steps above to create a database and export a model based on a new dataset specified by the user.

In [None]:
# Import necessary packages
import sys
import pandas as pd
from sqlalchemy import create_engine
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
import joblib

def load_data(data_file):
    """
    Load the data from the specified CSV file, clean it, and return the features and labels.

    Args:
    data_file (str): The file path of the CSV file containing the dataset.

    Returns:
    X (DataFrame): The features (messages).
    y (DataFrame): The labels (categories).
    """
    # Load dataset
    df = pd.read_csv(data_file)
    
    # Assuming 'message' is the feature column and columns starting from index 4 are labels
    X = df['message']
    y = df.iloc[:, 4:]  # Adjust based on the column index where labels start
    
    return X, y

def build_model():
    """
    Build a machine learning pipeline with TfidfVectorizer and Logistic Regression,
    and set up a GridSearchCV for hyperparameter tuning.

    Returns:
    model (GridSearchCV): The model pipeline with GridSearchCV.
    """
    pipeline = Pipeline([
        ('vect', TfidfVectorizer(tokenizer=tokenize, token_pattern=None)),
        ('clf', MultiOutputClassifier(LogisticRegression(max_iter=1000)))
    ])

    # Define hyperparameters for GridSearchCV
    parameters = {
        'vect__max_df': [0.9, 1.0],
        'clf__estimator__C': [1, 10]
    }

    # Grid search for hyperparameter tuning
    model_pipeline = GridSearchCV(pipeline, param_grid=parameters, verbose=3, cv=3)

    return model_pipeline

def train(X, y, model):
    """
    Train the model on the training data, evaluate it on the test set,
    and output classification reports for each category.

    Args:
    X (DataFrame): The features.
    y (DataFrame): The labels.
    model (GridSearchCV): The model pipeline.

    Returns:
    model: The trained model.
    """
    # Split data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Train the model
    model.fit(X_train, y_train)
    
    # Predict on the test set
    y_pred = model.predict(X_test)
    
    # Output classification report for each category
    for i, col in enumerate(y.columns):
        print(f'Category: {col}')
        print(classification_report(y_test.iloc[:, i], y_pred[:, i]))
    
    return model

def export_model(model, model_filepath):
    """
    Save the trained model as a pickle file.

    Args:
    model: The trained model.
    model_filepath (str): The file path where the model should be saved.
    """
    joblib.dump(model, model_filepath)

def run_pipeline(data_file, model_filepath):
    """
    Execute the full pipeline: load data, build the model, train it, and export the model.

    Args:
    data_file (str): The file path of the CSV dataset.
    model_filepath (str): The file path where the model should be saved.
    """
    X, y = load_data(data_file)
    model = build_model()
    model = train(X, y, model)
    export_model(model, model_filepath)

if __name__ == '__main__':
    if len(sys.argv) == 3:
        data_file = sys.argv[1]
        model_filepath = sys.argv[2]
        
        print(f'Running pipeline for dataset: {data_file}')
        run_pipeline(data_file, model_filepath)
        print(f'Model saved to: {model_filepath}')
    else:
        print('Please provide the filepath of the dataset and the filepath to save the model as arguments.')
        print('Example: python train_classifier.py data/messages.csv model.pkl')
