# Gender Recognition by Voice Machine Learning SVM

### Used Libraries
1. NumPy (Numerical Python)
2. Pandas
3. Matplotlib
4. Seaborn
5. Sckit learn
6. Missingno

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# visualization tools
import matplotlib.pyplot as plt
import seaborn as sns
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

### Content:
1. Missingo - Missing Data
2. Seaborn - Heatmap
3. Separating Features and Labels
4. Converting String Value To int Type for Labels
5. Data Standardisation
6. Splitting Dataset into Training Set and Testing Set
7. Build SVM Model with Default Hyperparameter
8. Accuracy Score
9. Confusion Matrix
10. F1 Score

### Reading Data

In [None]:
df=pd.read_csv("../input/voicegender/voice.csv")

In [None]:
df.sample(5)

In [None]:
df.info()

# Missingno - Missing Data

In [None]:
import missingno as msno
msno.matrix(df)
plt.show()

# Seaborn - Heatmap
### Relationship between columns
* 01 --> Direct proportion
* 00 --> No relationship
* -1 --> İnverse proportion

In [None]:
f,ax = plt.subplots(figsize=(25, 15))
sns.heatmap(df.corr(), annot=True, linewidths=0.5,linecolor="red", fmt= '.1f',ax=ax)
plt.show()

# Separating Features and Labels

In [None]:
X=df.iloc[:, :-1]
X.head()

# Converting String Value To int Type for Labels
### Encode label category
* Male -> 1
* Female -> 0

In [None]:
from sklearn.preprocessing import LabelEncoder
y=df.iloc[:,-1]

encoder = LabelEncoder()
y = encoder.fit_transform(y)
print(y)

# Data Standardisation
### Scale the data to be between -1 and 1

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)

# Splitting Dataset into Training Set and Testing Set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Build SVM Model with Default Hyperparameter

In [None]:
from sklearn.svm import SVC
from sklearn import metrics
svc=SVC() #Default hyperparameters
svc.fit(X_train,y_train)
y_pred=svc.predict(X_test)

# Accuracy Score

In [None]:
print('Accuracy Score:')
print(metrics.accuracy_score(y_test,y_pred))

# Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

# F1 Score

In [None]:
from sklearn.metrics import  f1_score
f1_score = f1_score(y_test, y_pred)
print("F1 Score:")
print(f1_score)

# Thank You

If you have any suggestion or advice or feedback, I will be very appreciated to hear them.
### Also there are other visualization kernels
* [FIFA 19 Player Data Analysis and Visualization EDA](https://www.kaggle.com/ismailsefa/f-fa-19-player-data-analysis-and-visualization-eda)
* [Crimes Data Analysis and Visualzation (EDA)](https://www.kaggle.com/ismailsefa/crimes-data-analysis-and-visualzation-eda)
* [Google Play Store Apps Data Analysis (EDA)](https://www.kaggle.com/ismailsefa/google-play-store-apps-data-analysis-eda)
* [World Happiness Data Analysis and Visualization](https://www.kaggle.com/ismailsefa/world-happiness-data-analysis-and-visualization)
* [Used Cars Data Analysis and Visualization (EDA)](https://www.kaggle.com/ismailsefa/used-cars-data-analysis-and-visualization-eda)
* [Gender Recognition by Voice Machine Learning SVM](https://www.kaggle.com/ismailsefa/gender-recognition-by-voice-machine-learning-svm)