To develop a machine learning model that can predict the likelihood of Autism Spectrum Disorder (ASD) in individuals based on questionnaire-based screening data.
Early detection of Autism Spectrum Disorder is crucial for timely intervention and support. Traditional diagnosis often requires specialized clinical assessment, which may not be easily accessible in all regions. This project aims to assist early screening using machine learning models trained on structured questionnaire responses.
- Source: Kaggle - Autism Screening Data
- Features include:
- Age
- Gender
- Ethnicity
- Family history of ASD
- Jaundice at birth
- Screening test responses (Q1 to Q10)
- Result from the Autism Spectrum Quotient test
- Class/target: ASD or Not
- Handle missing values and inconsistent entries
- Convert categorical data to numerical using encoding
- Normalize/standardize features as needed
- Train-test split (e.g., 80-20)
- Understand class imbalance
- Visualize age distribution and test response trends
- Check correlation between features
- Test multiple models:
- Logistic Regression โ (Best Performing)
- Random Forest
- Support Vector Machine (SVM)
- Naive Bayes
- Use GridSearchCV or cross-validation for tuning
- Accuracy
- Precision, Recall, F1-Score
- Confusion Matrix
- ROC-AUC Score
- Logistic Regression achieved:
- Accuracy: 86.88%
- High Recall: Suitable for screening tasks
- Streamlit app to collect questionnaire responses and display predictions
- User-friendly interface for non-technical audiences
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- (Optional) Streamlit for frontend
- Best model (Logistic Regression) achieved 86.88% accuracy
- Balanced performance with good recall for detecting potential ASD cases
The project showcases the potential of machine learning in supporting early autism screening using easily available data from questionnaires. With further development and validation, such tools can complement clinical assessments, especially in resource-constrained settings.
- Train on larger and more diverse datasets
- Implement ensemble learning methods
- Enhance the user interface for deployment
- Collaborate with healthcare professionals for clinical validation