## Data Description 

Autistic Spectrum Disorder (ASD) is a neurodevelopment condition associated with significant healthcare costs, and early diagnosis can significantly reduce these. Unfortunately, waiting times for an ASD diagnosis are lengthy and procedures are not cost effective. The economic impact of autism and the increase in the number of ASD cases across the world reveals an urgent need for the development of easily implemented and effective screening methods. 

Therefore, a time-efficient and accessible ASD screening is imminent to help health professionals and inform individuals whether they should pursue formal clinical diagnosis. The rapid growth in the number of ASD cases worldwide necessitates datasets related to behaviour traits.

However, such datasets are rare making it difficult to perform thorough analyses to improve the efficiency, sensitivity, specificity and predictive accuracy of the ASD screening process. Presently, very limited autism datasets associated with clinical or screening are available and most of them are genetic in nature. 

Hence, we propose a new dataset related to autism screening of adults that contained 20 features to be utilised for further analysis especially in determining influential autistic traits and improving the classification of ASD cases. In this dataset, we record ten behavioural features (AQ-10-Adult) plus ten individuals characteristics that have proved to be effective in detecting the ASD cases from controls in behaviour science.

### Columns Description

A1_Score - A10_Score columns are questions whose answers are ('0', '1') <br>
age - number in years <br>
gender - female or male <br>
ethnicity - ('White-European', 'Latino', 'Others', 'Black', 'Asian', "'Middle Eastern '", 'Pasifika', "'South Asian'", <br> 'Hispanic', 'Turkish', 'others') <br>
jundice - ('no', 'yes') Whether the case was born with jaundice <br>
contry_of_res - Country of residence <br>
used_app_before - ('no', 'yes')  <br>
result - Result of the test  <br>
age_desc - Text description of age ("'18 and more'",) <br>
relation - Who is completing the test ('Self', 'Parent', "'Health care professional'", 'Relative', 'Others') <br>
Class/ASD - ('NO', 'YES') Whether the case has Autism Spectral Disorder  <br>
austim - ('no', 'yes')  Whether family member had autism <br>

** Problem Statement **

We have to predict whether a patient has Autism or Not

In [0]:
#Importing the library

import numpy as np
import pandas as pd

In [0]:
# pip install liac-arff

import arff

In [0]:
dataset = arff.load(open('Autism-Adult-Data.arff'))

In [0]:
dataset

{'attributes': [('A1_Score', ['0', '1']),
  ('A2_Score', ['0', '1']),
  ('A3_Score', ['0', '1']),
  ('A4_Score', ['0', '1']),
  ('A5_Score', ['0', '1']),
  ('A6_Score', ['0', '1']),
  ('A7_Score', ['0', '1']),
  ('A8_Score', ['0', '1']),
  ('A9_Score', ['0', '1']),
  ('A10_Score', ['0', '1']),
  ('age', 'NUMERIC'),
  ('gender', ['f', 'm']),
  ('ethnicity',
   ['White-European',
    'Latino',
    'Others',
    'Black',
    'Asian',
    'Middle Eastern ',
    'Pasifika',
    'South Asian',
    'Hispanic',
    'Turkish',
    'others']),
  ('jundice', ['no', 'yes']),
  ('austim', ['no', 'yes']),
  ('contry_of_res',
   ['United States',
    'Brazil',
    'Spain',
    'Egypt',
    'New Zealand',
    'Bahamas',
    'Burundi',
    'Austria',
    'Argentina',
    'Jordan',
    'Ireland',
    'United Arab Emirates',
    'Afghanistan',
    'Lebanon',
    'United Kingdom',
    'South Africa',
    'Italy',
    'Pakistan',
    'Bangladesh',
    'Chile',
    'France',
    'China',
    'Australia',
  

In [0]:
len(dataset['attributes'])

21

In [0]:
col_names=[]
for item in range(len(dataset['attributes'])):
    col, val = dataset['attributes'][item]
    col_names.append(col)
    
col_names

['A1_Score',
 'A2_Score',
 'A3_Score',
 'A4_Score',
 'A5_Score',
 'A6_Score',
 'A7_Score',
 'A8_Score',
 'A9_Score',
 'A10_Score',
 'age',
 'gender',
 'ethnicity',
 'jundice',
 'austim',
 'contry_of_res',
 'used_app_before',
 'result',
 'age_desc',
 'relation',
 'Class/ASD']

In [0]:
data = np.array(dataset['data'])

In [0]:
data

array([['1', '1', '1', ..., '18 and more', 'Self', 'NO'],
       ['1', '1', '0', ..., '18 and more', 'Self', 'NO'],
       ['1', '1', '0', ..., '18 and more', 'Parent', 'YES'],
       ...,
       ['1', '0', '1', ..., '18 and more', None, 'YES'],
       ['1', '0', '0', ..., '18 and more', 'Self', 'NO'],
       ['1', '0', '1', ..., '18 and more', 'Self', 'YES']], dtype=object)

In [0]:
df = pd.DataFrame(data, columns=col_names)

In [0]:
df.head()

Unnamed: 0,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,A10_Score,...,gender,ethnicity,jundice,austim,contry_of_res,used_app_before,result,age_desc,relation,Class/ASD
0,1,1,1,1,0,0,1,1,0,0,...,f,White-European,no,no,United States,no,6,18 and more,Self,NO
1,1,1,0,1,0,0,0,1,0,1,...,m,Latino,no,yes,Brazil,no,5,18 and more,Self,NO
2,1,1,0,1,1,0,1,1,1,1,...,m,Latino,yes,yes,Spain,no,8,18 and more,Parent,YES
3,1,1,0,1,0,0,1,1,0,1,...,f,White-European,no,yes,United States,no,6,18 and more,Self,NO
4,1,0,0,0,0,0,0,1,0,0,...,f,,no,no,Egypt,no,2,18 and more,,NO


## Question : Perform Descriptive Statistics on the dataset and come up with insights

## Question : Perform Feature Engineering on the dataset and remove null values

## Question: Perform EDA on the dataset and explain your observations 

## Question : Perform Classification using KNN algorithm and evaluate its performance.