# 1.Project Title: [Data Classificcation]
___

#### a. Introduction

- **Objective:** Clearly state the goal of your project. What problem are you trying to solve?
- **Background:** Provide context on why this problem is important or interesting. Mention any relevant research, datasets, or industry relevance.
- **Scope:** Define the boundaries of your project. What will be included, and what will be out of scope?

#### b. Project Overview

- **Project Summary:** A brief overview of the project, including the main steps you will take to achieve the objective.
- **Milestones:** Outline the key milestones or phases of the project. For example:
  - Data Collection
  - Data Preprocessing
  - Model Selection
  - Model Training and Evaluation
  - Results and Conclusion


#### c. About the Author

- **Name:** [Ahmed Ferganey]
- **Background:** Junior Data Scientist and Machine Learning Engineer with a strong foundation in embedded systems, industrial engineering, and supply chain management. Knowledgeable in statistical analysis, NLP, Computer Vision, and deep learning, with hands-on experience in Python, SQL, and Docker.
- **Motivation:** Why are you interested in this project? What do you hope to learn or achieve?
- **Contact:** [LinkedIn acc](https://www.linkedin.com/in/ahmed-ferganey/)



#### d. Tools and Technologies

- **Programming Languages:** List the programming languages you will use (e.g., Python).
- **Libraries and Frameworks:** List the specific libraries and frameworks you will use (e.g., TensorFlow, scikit-learn).
- **Software and Tools:** Mention any software or tools necessary for the project (e.g., Jupyter Notebook, Git).

#### e. Dataset Description

- **Dataset Name:** [Name of the Dataset]
- **Source:** Where did you obtain the dataset? Include a link if possible.
- **Description:** Briefly describe the dataset, including the number of features, the target variable, and any other important details.
- **Data Preprocessing:** Outline any preprocessing steps you anticipate, such as data cleaning, normalization, or feature engineering.

#### f. Methodology

- **Model Selection:** Describe the types of models you are considering and why.
- **Evaluation Metrics:** Define how you will evaluate your models' performance (e.g., accuracy, F1-score).
- **Validation Strategy:** Explain how you will validate your models, such as cross-validation or a 


### 2. importing libraries
___



In [1]:

import io
import sys
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(style="whitegrid")
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectPercentile , f_classif ,SelectKBest
from sklearn.feature_selection import chi2 , f_classif 
import plotly.express as px
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression,SGDClassifier
from sklearn.naive_bayes import GaussianNB,BernoulliNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis,QuadraticDiscriminantAnalysis
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.decomposition import PCA


### 3. reading the raw data
___

In [2]:
base_path = '/media/ahmed-ferganey/AI4/01-Learning_AI'
sub_folder_one = 'Movies'
sub_folder_two = 'Json'
MoviesFilesDir = f'{base_path}/MyGitHub/HeshamAsem_ML_App3/{sub_folder_one}'
JsonFilesDir = f'{base_path}/MyGitHub/HeshamAsem_ML_App3/{sub_folder_two}'


In [5]:
# Correct way to use the variable in the file path
credits_path = f'{MoviesFilesDir}/credits.csv'
credits = pd.read_csv(credits_path)

In [8]:
credits

Unnamed: 0,cast,crew,id
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862
...,...,...,...
45471,"[{'cast_id': 0, 'character': '', 'credit_id': ...","[{'credit_id': '5894a97d925141426c00818c', 'de...",439050
45472,"[{'cast_id': 1002, 'character': 'Sister Angela...","[{'credit_id': '52fe4af1c3a36847f81e9b15', 'de...",111109
45473,"[{'cast_id': 6, 'character': 'Emily Shaw', 'cr...","[{'credit_id': '52fe4776c3a368484e0c8387', 'de...",67758
45474,"[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '533bccebc3a36844cf0011a7', 'de...",227506


In [9]:
credits.iloc[0,0]

"[{'cast_id': 14, 'character': 'Woody (voice)', 'credit_id': '52fe4284c3a36847f8024f95', 'gender': 2, 'id': 31, 'name': 'Tom Hanks', 'order': 0, 'profile_path': '/pQFoyx7rp09CJTAb932F2g8Nlho.jpg'}, {'cast_id': 15, 'character': 'Buzz Lightyear (voice)', 'credit_id': '52fe4284c3a36847f8024f99', 'gender': 2, 'id': 12898, 'name': 'Tim Allen', 'order': 1, 'profile_path': '/uX2xVf6pMmPepxnvFWyBtjexzgY.jpg'}, {'cast_id': 16, 'character': 'Mr. Potato Head (voice)', 'credit_id': '52fe4284c3a36847f8024f9d', 'gender': 2, 'id': 7167, 'name': 'Don Rickles', 'order': 2, 'profile_path': '/h5BcaDMPRVLHLDzbQavec4xfSdt.jpg'}, {'cast_id': 17, 'character': 'Slinky Dog (voice)', 'credit_id': '52fe4284c3a36847f8024fa1', 'gender': 2, 'id': 12899, 'name': 'Jim Varney', 'order': 3, 'profile_path': '/eIo2jVVXYgjDtaHoF19Ll9vtW7h.jpg'}, {'cast_id': 18, 'character': 'Rex (voice)', 'credit_id': '52fe4284c3a36847f8024fa5', 'gender': 2, 'id': 12900, 'name': 'Wallace Shawn', 'order': 4, 'profile_path': '/oGE6JqPP2xH4t

In [10]:
type(credits.iloc[0,0])

str

### 4. data analysis
___

### 5. data cleaning
___

##### 5.1 finding nulls

##### 5.2 outliers

##### 5.3 feature extraction

##### 5.4 feature selection

### 6. visualization
___

### 7. building the model
___

### 8. evaluation the model
___