# Projet de machine learning : Détection d'oiseau dans un enregistrement sonore

Ce notebook présente les différents modèles testé afin de répondre à la problématique le pré-traitement des données permettant de construire le fichier spectre.csv se trouve dans le notebook prepocessing.ipynb

## 1. Chargement des données

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

In [2]:
df=pd.read_csv("spectre.csv",index_col='itemid').sort_index()

df.head()

Unnamed: 0_level_0,hasBird,0,100,200,300,400,500,600,700,800,...,19000,19100,19200,19300,19400,19500,19600,19700,19800,19900
itemid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
55,0,0.046503,0.052254,0.031297,0.016394,0.015485,0.019629,0.011567,0.006598,0.005619,...,0.000149,9.3e-05,8e-05,7.6e-05,7.8e-05,7.6e-05,7.5e-05,7.4e-05,7.2e-05,6.9e-05
87,0,0.19054,0.036987,0.019515,0.049982,0.126856,0.059926,0.018224,0.011003,0.007375,...,2e-06,2e-06,2e-06,2e-06,2e-06,2e-06,2e-06,2e-06,2e-06,2e-06
99,0,0.239988,0.22326,0.166995,0.171129,0.203208,0.133131,0.14184,0.117775,0.126883,...,0.000183,0.000182,0.000182,0.000182,0.000182,0.000181,0.000181,0.000181,0.00018,0.000181
100,1,0.089195,0.107197,0.072524,0.061859,0.054955,0.048336,0.04738,0.046404,0.101157,...,2e-05,2e-05,2e-05,2e-05,2.1e-05,1.9e-05,2e-05,2e-05,2e-05,2e-05
104,0,0.136297,0.194051,0.138342,0.138522,0.142344,0.156182,0.161801,0.147858,0.123645,...,1.5e-05,1.5e-05,1.5e-05,1.5e-05,1.5e-05,1.5e-05,1.5e-05,1.5e-05,1.5e-05,1.5e-05


In [3]:
freq=range(0,20000,100)

X=df.drop(['hasBird'],axis=1)

y=df['hasBird']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

print(X_train.shape,X_test.shape)

(5152, 200) (2538, 200)


## 2. Tests de différents modèles de machine learning

### 2.1. Régression logistique

In [4]:
clf_lr = LogisticRegression(solver='lbfgs')

clf_lr.fit(X_train,y_train)

train_score = clf_lr.score(X_train,y_train)
test_score = clf_lr.score(X_test,y_test)
cross_val=cross_val_score(clf_lr,X,y,cv=5)

print(train_score,test_score)

print(cross_val)

print(cross_val.mean())

0.7721273291925466 0.7734436564223798
[0.77438231 0.77893368 0.76723017 0.76853056 0.77048114]
0.7719115734720416


### 2.2. Arbre de décision

In [5]:
from sklearn import tree

clf_tree = tree.DecisionTreeClassifier()
clf_tree = clf_tree.fit(X_train, y_train)

train_score = clf_tree.score(X_train,y_train)
test_score = clf_tree.score(X_test,y_test)

cross_val=cross_val_score(clf_tree,X,y,cv=5)

print(train_score,test_score)

print(cross_val)

print(cross_val.mean())

1.0 0.7289204097714737
[0.69245774 0.72041612 0.7236671  0.70936281 0.72171651]
0.713524057217165


### 2.3. Plus proches voisins

In [6]:
from sklearn.neighbors import KNeighborsClassifier
clf_nn = KNeighborsClassifier(n_neighbors=3)

clf_nn.fit(X_train,y_train)

train_score = clf_nn.score(X_train,y_train)
test_score = clf_nn.score(X_test,y_test)

cross_val=cross_val_score(clf_nn,X,y,cv=5)

print(train_score,test_score)

print(cross_val)

print(cross_val.mean())

0.859277950310559 0.7663514578408196
[0.77113134 0.77243173 0.7496749  0.73862159 0.75812744]
0.7579973992197659


### 2.4. SVM

In [7]:
from sklearn import svm

clf_svm=svm.SVC()

clf_svm.fit(X_train,y_train)

train_score = clf_svm.score(X_train,y_train)
test_score = clf_svm.score(X_test,y_test)

cross_val=cross_val_score(clf_svm,X,y,cv=5)

print(train_score,test_score)

print(cross_val)

print(cross_val.mean())

0.8039596273291926 0.7923561859732072
[0.80884265 0.77828349 0.77178153 0.77373212 0.78478544]
0.7834850455136542


## 3. Test avec un réseau profond

In [8]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dropout(.2, input_shape=(200,)),
    Dense(128, activation='relu'),
    Dense(128, activation='relu'),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=10, batch_size=100)

model.evaluate(X_test, y_test)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.47087550163269043, 0.8022064566612244]

## 4. Export du meilleur classifieur 

Le meilleur classifieur est le Support-vector Machine (svm) d'après son score.

In [9]:
from joblib import dump

In [10]:
dump(clf_svm,'classifier.joblib')

['classifier.joblib']