For infant cry detection, you can use various machine learning models to train your classifier. The choice of model depends on your dataset, feature engineering, and performance requirements. Here are some common machine learning models you can consider for infant cry detection:

1. **Random Forest Classifier:** As you've already used, Random Forest is an ensemble learning method that can work well for classification tasks. It can handle both numerical and categorical features and is robust against overfitting.

2. **Support Vector Machine (SVM):** SVM is a powerful classifier that can be effective in binary classification tasks like cry detection. It tries to find the hyperplane that best separates the two classes.

3. **Logistic Regression:** Logistic regression is a simple yet effective model for binary classification. It's easy to interpret and can serve as a baseline model.

4. **K-Nearest Neighbors (K-NN):** K-NN is a non-parametric algorithm that classifies an instance based on the majority class among its k-nearest neighbors. It's easy to understand and can work well with appropriate distance metrics.

5. **Gradient Boosting (e.g., XGBoost, LightGBM):** Gradient boosting algorithms can provide high accuracy and can handle complex relationships in the data. They are particularly useful when dealing with imbalanced datasets.

6. **Neural Networks:** Deep learning models, such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs), can be employed for audio-based classification tasks like cry detection. They require a larger amount of data and computational resources but can achieve state-of-the-art performance.

7. **Naive Bayes:** Naive Bayes classifiers are simple probabilistic models that can be suitable for text-based cry detection when text features are used.

8. **Ensemble Methods:** You can create an ensemble of different classifiers to improve overall performance. For example, you can combine the predictions of multiple models like Random Forest, SVM, and Logistic Regression.

9. **Hidden Markov Models (HMMs):** HMMs are commonly used for time-series data and speech recognition. They can be adapted for audio-based cry detection.

The choice of model depends on the complexity of your dataset, the quality of your features, and your computational resources. It's often a good idea to start with simpler models (e.g., Random Forest or Logistic Regression) and gradually explore more complex models if needed. Additionally, consider using techniques like feature engineering, hyperparameter tuning, and cross-validation to optimize your model's performance.


In [75]:
from scipy.io.wavfile import read
import scipy.signal as signal
import matplotlib as mpl
from scipy.stats import skew, kurtosis
cmap = mpl.colormaps['Reds']


In [76]:

sampling_rate,data= read(r"C:\Users\Admin\Downloads\baby-crying-01.wav")

  sampling_rate,data= read(r"C:\Users\Admin\Downloads\baby-crying-01.wav")


In [77]:
print(data.shape)

(816000, 2)


In [78]:
wav=data[:,1]

In [79]:
print(type(wav))

<class 'numpy.ndarray'>


In [80]:
plt.plot(wav)
plt.show()

NameError: name 'plt' is not defined

In [81]:
f, t, Zxx = signal.stft(wav, fs=sampling_rate)

In [82]:
print(f.shape)

(129,)


In [83]:
plt.pcolormesh(t, f, np.abs(Zxx), cmap=cmap)

NameError: name 'plt' is not defined

In [126]:
import os
import numpy as np
from scipy.io import wavfile
from scipy.signal import spectrogram
import pandas as pd
import sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from scipy.stats import skew, kurtosis


In [127]:
# Directory containing the infant cry audio files
data_dir_cry = r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby"

# List all files in the directory
audio_files_cry = os.listdir(data_dir_cry)

In [128]:
# Directory containing the infant non-cry audio files
data_dir_non_cry = r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\901 - Silence"

# List all files in the directory
audio_files_non_cry = os.listdir(data_dir_non_cry)

In [129]:
# Sample rate and other parameters
fs = 500  # Sample rate 
nperseg = 64  # Window size for STFT 
noverlap = 8 # Overlap between windows 

# Initialize empty lists for features
features = []
labels = []

# List all files in the directory
#audio_files_cry = os.listdir(data_dir_cry)
#audio_files_non_cry = os.listdir(data_dir_non_cry)

# Loop through each cry audio file
for audio_file in audio_files_cry:
    # Check if the file is a WAV file
    if audio_file.endswith(".wav"):
        # Construct the full path to the audio file
        audio_path = os.path.join(data_dir_cry, audio_file)
        
        # Read the audio file
        fs, audio_signal = wavfile.read(audio_path)

        # Compute STFT
        f, t, Sxx = spectrogram(audio_signal, fs=fs, nperseg=nperseg, noverlap=noverlap)

        # Calculate the 20 features from STFT
        tf_mean = np.mean(Sxx)
        tf_std = np.std(Sxx)
        
        tma = np.max(Sxx, axis=0)
        tma_max = np.max(tma)
        tma_min = np.min(tma)
        tma_mean = np.mean(tma)
        tma_std = np.std(tma)
        tma_skewness = skew(tma)
        tma_kurt = kurtosis(tma)
        
        fma = np.max(Sxx, axis=1)
        fma_max = np.max(fma)
        fma_min = np.min(fma)
        fma_mean = np.mean(fma)
        fma_std = np.std(fma)
        fma_skewness = skew(fma)
        fma_kurt = kurtosis(fma)
        
        fsda = np.std(Sxx, axis=1)
        fsda_max = np.max(fsda)
        fsda_min = np.min(fsda)
        fsda_mean = np.mean(fsda)
        fsda_std = np.std(fsda)
        fsda_skewness = skew(fsda)
        fsda_kurt = kurtosis(fsda)

        # Determine if it's a cry or non-cry sound based on file name or other criteria
        # For example, you can use a naming convention or other metadata to label the data
        is_cry = 1 #if "cry" in audio_file else 0

        # Combine features into a feature vector
        feature_vector = np.array([is_cry, tf_mean, tf_std,
                                   tma_max, tma_min, tma_mean, tma_std, tma_skewness, tma_kurt,
                                   fma_max, fma_min, fma_mean, fma_std, fma_skewness, fma_kurt,
                                   fsda_max, fsda_min, fsda_mean, fsda_std, fsda_skewness, fsda_kurt])

        # Append feature vector and label to lists
        features.append(feature_vector)
        labels.append(is_cry)
        
print(feature_vector)       
      





[1.00000000e+00 1.11425845e-06 7.17823286e-06 4.07754356e-04
 9.03873143e-10 1.97792342e-05 3.10823634e-05 3.85379485e+00
 2.10503386e+01 4.07754356e-04 3.02229075e-12 2.66033549e-05
 8.19006018e-05 3.75852090e+00 1.33049486e+01 2.64877472e-05
 2.61818209e-13 2.18220225e-06 6.11661335e-06 3.29270060e+00
 9.60648201e+00]


In [130]:
# Loop through each audio file in the non-cry directory
for audio_file in audio_files_non_cry:
    # Check if the file is a WAV file
    if audio_file.endswith(".wav"):
        # Construct the full path to the audio file
        audio_path = os.path.join(data_dir_non_cry, audio_file)
        
        # Read the audio file
        fs, audio_signal = wavfile.read(audio_path)

        # Compute STFT
        f, t, Sxx = spectrogram(audio_signal, fs=fs, nperseg=nperseg, noverlap=noverlap)

        # Calculate the 20 features from STFT
        tf_mean = np.mean(Sxx)
        tf_std = np.std(Sxx)
        
        tma = np.max(Sxx, axis=0)
        tma_max = np.max(tma)
        tma_min = np.min(tma)
        tma_mean = np.mean(tma)
        tma_std = np.std(tma)
        tma_skewness = skew(tma)
        tma_kurt = kurtosis(tma)
        
        fma = np.max(Sxx, axis=1)
        fma_max = np.max(fma)
        fma_min = np.min(fma)
        fma_mean = np.mean(fma)
        fma_std = np.std(fma)
        fma_skewness = skew(fma)
        fma_kurt = kurtosis(fma)
        
        fsda = np.std(Sxx, axis=1)
        fsda_max = np.max(fsda)
        fsda_min = np.min(fsda)
        fsda_mean = np.mean(fsda)
        fsda_std = np.std(fsda)
        fsda_skewness = skew(fsda)
        fsda_kurt = kurtosis(fsda)

        # Label non-cry audio as 0
        is_cry = 0

        # Combine features into a feature vector
        feature_vector = np.array([is_cry, tf_mean, tf_std,
                                   tma_max, tma_min, tma_mean, tma_std, tma_skewness, tma_kurt,
                                   fma_max, fma_min, fma_mean, fma_std, fma_skewness, fma_kurt,
                                   fsda_max, fsda_min, fsda_mean, fsda_std, fsda_skewness, fsda_kurt])

        # Append feature vector and label to lists
        features.append(feature_vector)
        labels.append(is_cry)
print(feature_vector)       


[0.00000000e+00 4.25156479e-08 1.14318391e-06 2.06512152e-04
 1.15668874e-09 1.18215473e-06 6.43134626e-06 1.90275364e+01
 4.53019300e+02 2.06512152e-04 1.01963176e-08 7.01154886e-06
 3.53567666e-05 5.43749967e+00 2.77179056e+01 6.43170142e-06
 7.60432162e-10 2.23845490e-07 1.10258861e-06 5.40278890e+00
 2.74526798e+01]


In [131]:
# Create a DataFrame - column names
column_names = ['IsCry', 'TF_Mean', 'TF_Std', 'TMA_Max', 'TMA_Min', 'TMA_Mean', 'TMA_Std', 'TMA_Skewness', 'TMA_Kurtosis',
                'FMA_Max', 'FMA_Min', 'FMA_Mean', 'FMA_Std', 'FMA_Skewness', 'FMA_Kurtosis',
                'FSDA_Max', 'FSDA_Min', 'FSDA_Mean', 'FSDA_Std', 'FSDA_Skewness', 'FSDA_Kurtosis']

df = pd.DataFrame(features, columns=column_names)

# Shuffle the DataFrame
df = df.sample(frac=1, random_state=42).reset_index(drop=True)

# Split the data into training and testing sets (80% train, 20% test)
X = df.drop('IsCry', axis=1)
y = df['IsCry']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



Train a Random Forest Classifier

In [132]:
df.dropna(inplace=True)  # Drops rows with missing values

# Initialize and train a Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)




# Make predictions on the test set
y_pred = clf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Classification Report:\n", report)

# Save the trained model to a file
model_path = "cry_detection_model.pkl"
joblib.dump(clf, model_path)

# Fit the model with feature names
clf.feature_names = feature_vector 


Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

         0.0       1.00      1.00      1.00        23
         1.0       1.00      1.00      1.00        13

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36



In [133]:
sklearn.metrics.confusion_matrix(y_test, y_pred)

array([[23,  0],
       [ 0, 13]], dtype=int64)

In [134]:
import numpy as np
from scipy.io import wavfile
from scipy.signal import spectrogram
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import joblib

# Function to extract features from an audio file
def extract_features(audio_path):
    # Sample rate and other parameters (should match the training parameters)
    fs = 500
    nperseg = 64
    noverlap = 8

    # Read the audio file
    fs, audio_signal = wavfile.read(audio_path)

    # Compute STFT
    f, t, Sxx = spectrogram(audio_signal, fs=fs, nperseg=nperseg, noverlap=noverlap)

    # Calculate the 20 features from STFT
    tf_mean = np.mean(Sxx)
    tf_std = np.std(Sxx)
    
    tma = np.max(Sxx, axis=0)
    tma_max = np.max(tma)
    tma_min = np.min(tma)
    tma_mean = np.mean(tma)
    tma_std = np.std(tma)
    tma_skewness = skew(tma)
    tma_kurt = kurtosis(tma)
        
    fma = np.max(Sxx, axis=1)
    fma_max = np.max(fma)
    fma_min = np.min(fma)
    fma_mean = np.mean(fma)
    fma_std = np.std(fma)
    fma_skewness = skew(fma)
    fma_kurt = kurtosis(fma)
        
    fsda = np.std(Sxx, axis=1)
    fsda_max = np.max(fsda)
    fsda_min = np.min(fsda)
    fsda_mean = np.mean(fsda)
    fsda_std = np.std(fsda)
    fsda_skewness = skew(fsda)
    fsda_kurt = kurtosis(fsda)

    # Determine if it's a cry or non-cry sound based on file name or other criteria
    is_cry = 1 if "cry" in audio_file else 0

    # Combine features into a feature vector
    feature_vector = np.array([ tf_mean, tf_std,
                               tma_max, tma_min, tma_mean, tma_std, tma_skewness, tma_kurt,
                               fma_max, fma_min, fma_mean, fma_std, fma_skewness, fma_kurt, 
                               fsda_max, fsda_min, fsda_mean, fsda_std, fsda_skewness, fsda_kurt])

    # Append feature vector and label to lists
    features.append(feature_vector)
    labels.append(is_cry)
    return feature_vector


# Path to the new audio clip 
new_audio_path = r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\903 - Baby laugh\laugh_1.m4a_9.wav"


#baby laugh
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\903 - Baby laugh\laugh_1.m4a_9.wav"

#baby cry
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#silence
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\901 - Silence\silence.wav_8.wav"

# Extract features from the new audio clip
new_feature_vector = extract_features(new_audio_path)

# Make a prediction using the trained model
prediction = clf.predict([new_feature_vector])
print(prediction)
# Map prediction to class label (you can define a mapping if needed)
#class_label = "Cry Sound" if prediction == 1 else "Non-Cry Sound"

# Print the result
#print(f"The audio clip is classified as: {class_label}")
# Interpret the prediction
if prediction[0] == 1:
    print("The new audio clip contains a cry sound.")
else:
    print("The new audio clip does not contain a cry sound.")

[0.]
The new audio clip does not contain a cry sound.




In [117]:
# Folder containing the audio clips you want to classify
folder_path = r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\New folder"

# List all audio files in the folder
audio_files = [f for f in os.listdir(folder_path) if f.endswith(".wav")]

# Classify each audio clip in the folder
for audio_file in audio_files:
    # Get the full path to the audio file
    audio_path = os.path.join(folder_path, audio_file)

    # Extract features from the audio clip
    new_feature_vector = extract_features(audio_path)

    # Make a prediction using the trained model
    prediction = clf.predict([new_feature_vector])

    # Interpret the prediction
    if prediction[0] == 1:
        print(f" '{audio_file}' - a cry sound.")
    else:
        print(f"'{audio_file}' - not a cry sound.")


'Louise_01.m4a_0.wav' - not a cry sound.
 'Louise_01.m4a_1.wav' - a cry sound.
 'Louise_01.m4a_10.wav' - a cry sound.
 'Louise_01.m4a_11.wav' - a cry sound.
 'Louise_01.m4a_12.wav' - a cry sound.
 'Louise_01.m4a_13.wav' - a cry sound.
 'Louise_01.m4a_14.wav' - a cry sound.
 'Louise_01.m4a_2.wav' - a cry sound.
 'Louise_01.m4a_3.wav' - a cry sound.
 'Louise_01.m4a_4.wav' - a cry sound.
 'Louise_01.m4a_5.wav' - a cry sound.
 'Louise_01.m4a_6.wav' - a cry sound.
 'Louise_01.m4a_7.wav' - a cry sound.
 'Louise_01.m4a_8.wav' - a cry sound.
 'Louise_01.m4a_9.wav' - a cry sound.
 'margot.m4a_0.wav' - a cry sound.
 'margot.m4a_1.wav' - a cry sound.
 'margot.m4a_10.wav' - a cry sound.
'margot.m4a_11.wav' - not a cry sound.
 'margot.m4a_12.wav' - a cry sound.
 'margot.m4a_13.wav' - a cry sound.
 'margot.m4a_14.wav' - a cry sound.
 'margot.m4a_15.wav' - a cry sound.
 'margot.m4a_16.wav' - a cry sound.
 'margot.m4a_17.wav' - a cry sound.
 'margot.m4a_18.wav' - a cry sound.
 'margot.m4a_19.wav' - a 

In [118]:
from sklearn.impute import SimpleImputer  # Import SimpleImputer for handling missing values
from sklearn.svm import SVC


In [119]:
import os
import numpy as np
from scipy.io import wavfile
from scipy.signal import spectrogram
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from scipy.stats import skew, kurtosis


In [120]:
# Handle missing values in X using SimpleImputer (replace NaNs with the mean of each feature)
imputer = SimpleImputer(strategy='mean')
X = imputer.fit_transform(X)

# Convert the target variable to integers (0 for non-cry, 1 for cry)
y = y.astype(int)  

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train a Support Vector Machine (SVM) Classifier
clf = SVC(kernel='linear', C=1.0, random_state=42)

# Convert the target variable to integers (0 for non-cry, 1 for cry)
y = y.astype(int)

clf.fit(X_train, y_train)


# Now, you can use X_test_df for prediction
y_pred = clf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Classification Report:\n", report)

# Now, for testing a new audio clip:
new_audio_path = r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#baby laugh
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\903 - Baby laugh\laugh_1.m4a_9.wav"

#baby cry
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#silence
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\901 - Silence\silence.wav_8.wav"

# Read the new audio file
fs, new_audio_signal = wavfile.read(new_audio_path)

# Compute STFT for the new audio clip (similar to the training data)
f, t, Sxx = spectrogram(new_audio_signal, fs=fs, nperseg=nperseg, noverlap=noverlap)

# Calculate the 20 features from STFT
feature_vector = extract_features(new_audio_path)


# Here, we assume you have a function extract_features(Sxx) that computes the features
# feature_vector = extract_features(Sxx)

# Handle missing values in the feature vector using SimpleImputer
feature_vector = imputer.transform([feature_vector])

# Use the trained SVM classifier to predict the label for the new audio clip
predicted_label = clf.predict(feature_vector)

# Display the predicted label (1 for cry, 0 for non-cry)
print("Predicted Label for the New Audio Clip:", predicted_label[0])


Accuracy: 0.9444444444444444
Classification Report:
               precision    recall  f1-score   support

           0       0.96      0.96      0.96        23
           1       0.92      0.92      0.92        13

    accuracy                           0.94        36
   macro avg       0.94      0.94      0.94        36
weighted avg       0.94      0.94      0.94        36

Predicted Label for the New Audio Clip: 1


Logistic Regression

In [121]:
from sklearn.linear_model import LogisticRegression

In [122]:
# Initialize and train a Logistic Regression Classifier
clf = LogisticRegression(max_iter=1000, random_state=42)
clf.fit(X, y)

# Make predictions on the test set
y_pred = clf.predict(X)

# Evaluate the model (this is for training evaluation; you should split your data for proper evaluation)
accuracy = accuracy_score(y, y_pred)
report = classification_report(y, y_pred)

print("Accuracy:", accuracy)
print("Classification Report:\n", report)

# Save the trained model to a file
model_path = "cry_detection_logistic_regression_model.pkl"
joblib.dump(clf, model_path)

# Now, for testing a new audio clip:
new_audio_path = r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\901 - Silence\silence.wav_8.wav"
#baby laugh
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\903 - Baby laugh\laugh_1.m4a_9.wav"

#baby cry
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#silence
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\901 - Silence\silence.wav_8.wav"

# Read the new audio file
fs, new_audio_signal = wavfile.read(new_audio_path)

# Compute STFT for the new audio clip (similar to the training data)
f, t, Sxx = spectrogram(new_audio_signal, fs=fs, nperseg=nperseg, noverlap=noverlap)

# Calculate the 20 features from STFT
feature_vector = extract_features(new_audio_path)


# Here, we assume you have a function extract_features(Sxx) that computes the features
# feature_vector = extract_features(Sxx)

# Ensure that feature_vector contains 20 elements

# Handle missing values in the feature vector using SimpleImputer (use the same imputer from before)
feature_vector = imputer.transform([feature_vector])

# Use the trained Logistic Regression classifier to predict the label for the new audio clip
predicted_label = clf.predict(feature_vector)

# Display the predicted label (1 for cry, 0 for non-cry)
print("Predicted Label for the New Audio Clip:", predicted_label[0])

Accuracy: 0.8863636363636364
Classification Report:
               precision    recall  f1-score   support

           0       0.87      0.95      0.91       108
           1       0.91      0.78      0.84        68

    accuracy                           0.89       176
   macro avg       0.89      0.87      0.88       176
weighted avg       0.89      0.89      0.88       176

Predicted Label for the New Audio Clip: 0


K-Nearest Neighbors (K-NN)

In [123]:
from sklearn.neighbors import KNeighborsClassifier 

In [146]:
# Ensure X is in a suitable format (NumPy array)
X = np.array(X)

# Check if X is C-contiguous
if not X.flags.c_contiguous:
    # If not C-contiguous, create a new C-contiguous array and copy the data
    X = np.ascontiguousarray(X)

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train a K-Nearest Neighbors (K-NN) Classifier
clf = KNeighborsClassifier(n_neighbors=3)  # You can adjust the number of neighbors as needed
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Classification Report:\n", report)

# Save the trained model to a file
model_path = "cry_detection_model_knn.pkl"
joblib.dump(clf, model_path)

# Now, for testing a new audio clip:
new_audio_path = r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#baby laugh
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\903 - Baby laugh\laugh_1.m4a_9.wav"

#baby cry
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#silence
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\901 - Silence\silence.wav_8.wav"

# Read the new audio file
fs, new_audio_signal = wavfile.read(new_audio_path)

# Compute STFT for the new audio clip (similar to the training data)
f, t, Sxx = spectrogram(new_audio_signal, fs=fs, nperseg=nperseg, noverlap=noverlap)

# Calculate the 20 features from STFT
feature_vector = extract_features(new_audio_path)


# Here, we assume you have a function extract_features(Sxx) that computes the features
# feature_vector = extract_features(Sxx)

# Ensure that feature_vector contains 20 elements

# Handle missing values in the feature vector using SimpleImputer (use the same imputer from before)
feature_vector = imputer.transform([feature_vector])

# Use the trained Logistic Regression classifier to predict the label for the new audio clip
predicted_label = clf.predict(feature_vector)

# Display the predicted label (1 for cry, 0 for non-cry)
print("Predicted Label for the New Audio Clip:", predicted_label[0])

Accuracy: 0.8611111111111112
Classification Report:
               precision    recall  f1-score   support

         0.0       0.88      0.91      0.89        23
         1.0       0.83      0.77      0.80        13

    accuracy                           0.86        36
   macro avg       0.85      0.84      0.85        36
weighted avg       0.86      0.86      0.86        36

Predicted Label for the New Audio Clip: 1.0


Gradient Boosting (e.g., XGBoost)

In [142]:
import xgboost as xgb

In [147]:
# Initialize and train an XGBoost Classifier
clf = xgb.XGBClassifier(random_state=42)
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Classification Report:\n", report)

# Save the trained model to a file
model_path = "cry_detection_xgboost_model.pkl"
clf.save_model(model_path)

# Now, for testing a new audio clip:
new_audio_path = r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#baby laugh
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\903 - Baby laugh\laugh_1.m4a_9.wav"

#baby cry
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#silence
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\901 - Silence\silence.wav_8.wav"

# Read the new audio file
fs, new_audio_signal = wavfile.read(new_audio_path)

# Compute STFT for the new audio clip (similar to the training data)
f, t, Sxx = spectrogram(new_audio_signal, fs=fs, nperseg=nperseg, noverlap=noverlap)

# Calculate the 20 features from STFT
feature_vector = extract_features(new_audio_path)


# Here, we assume you have a function extract_features(Sxx) that computes the features
# feature_vector = extract_features(Sxx)

# Ensure that feature_vector contains 20 elements

# Handle missing values in the feature vector using SimpleImputer (use the same imputer from before)
feature_vector = imputer.transform([feature_vector])

# Use the trained Logistic Regression classifier to predict the label for the new audio clip
predicted_label = clf.predict(feature_vector)

# Display the predicted label (1 for cry, 0 for non-cry)
print("Predicted Label for the New Audio Clip:", predicted_label[0])

Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

         0.0       1.00      1.00      1.00        23
         1.0       1.00      1.00      1.00        13

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36

Predicted Label for the New Audio Clip: 1


  if is_sparse(data):


Neural Networks

In [154]:
import tensorflow as tf
from tensorflow import keras

In [159]:
# Build a neural network model
model = keras.Sequential([
    keras.layers.Input(20),  # Input layer with the number of features
    keras.layers.Dense(64, activation='relu'),  # Hidden layer with 64 units and ReLU activation
    keras.layers.Dense(32, activation='relu'),  # Hidden layer with 32 units and ReLU activation
    keras.layers.Dense(1, activation='sigmoid')  # Output layer with 1 unit and sigmoid activation
])


# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate the model on the test set
accuracy = model.evaluate(X_test, y_test)[1]
print("Test Accuracy:", accuracy)

# Save the trained model
model.save("infant_cry_detection_model.h5")

# Now, for testing a new audio clip:
new_audio_path = r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#baby laugh
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\903 - Baby laugh\laugh_1.m4a_9.wav"

#baby cry
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\301 - Crying baby\V_2017-04-01+08_06_22=0_30.mp3_12.wav"

#silence
#r"D:\infant cry detecting\dev\baby_cry\baby_cry_detection\data\901 - Silence\silence.wav_8.wav"

# Read the new audio file
fs, new_audio_signal = wavfile.read(new_audio_path)

# Compute STFT for the new audio clip (similar to the training data)
f, t, Sxx = spectrogram(new_audio_signal, fs=fs, nperseg=nperseg, noverlap=noverlap)

# Calculate the 20 features from STFT
feature_vector = extract_features(new_audio_path)


# Here, we assume you have a function extract_features(Sxx) that computes the features
# feature_vector = extract_features(Sxx)

# Ensure that feature_vector contains 20 elements

# Handle missing values in the feature vector using SimpleImputer (use the same imputer from before)
feature_vector = imputer.transform([feature_vector])

# Use the trained Logistic Regression classifier to predict the label for the new audio clip
predicted_label = clf.predict(feature_vector)

# Display the predicted label (1 for cry, 0 for non-cry)
print("Predicted Label for the New Audio Clip:", predicted_label[0])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Accuracy: 1.0
Predicted Label for the New Audio Clip: 1
