## Understanding the data

Multiple sclerosis is the most common immune-mediated disorder affecting the central nervous system.

Nearly one million people have MS in the United States in 2022, and in 2020, about 2.8 million people were affected globally, with rates varying widely in different regions and among different populations.

The disease usually begins between the ages of 20 and 50 and is twice as common in women as in men.

MS was first described in 1868 by French neurologist Jean-Martin Charcot.

The name "multiple sclerosis" is short for multiple cerebro-spinal sclerosis, which refers to the numerous glial scars (or sclerae – essentially plaques or lesions) that develop on the white matter of the brain and spinal cord."

In [2]:
# libaries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [4]:
# reading the data
df = pd.read_csv("multiple sclerosis.csv")

In [7]:
df.shape

(273, 20)

In [8]:
df.head()

Unnamed: 0.1,Unnamed: 0,Gender,Age,Schooling,Breastfeeding,Varicella,Initial_Symptom,Mono_or_Polysymptomatic,Oligoclonal_Bands,LLSSEP,ULSSEP,VEP,BAEP,Periventricular_MRI,Cortical_MRI,Infratentorial_MRI,Spinal_Cord_MRI,Initial_EDSS,Final_EDSS,group
0,0,1,34,20.0,1,1,2.0,1,0,1,1,0,0,0,1,0,1,1.0,1.0,1
1,1,1,61,25.0,3,2,10.0,2,1,1,0,1,0,0,0,0,1,2.0,2.0,1
2,2,1,22,20.0,3,1,3.0,1,1,0,0,0,0,0,1,0,0,1.0,1.0,1
3,3,2,41,15.0,1,1,7.0,2,1,0,1,1,0,1,1,0,0,1.0,1.0,1
4,4,2,34,20.0,2,1,6.0,2,0,1,0,0,0,1,0,0,0,1.0,1.0,1


## Data description

ID: Patient identifier<br>
Age: Age of the patient (in years)<br>
Schooling: time the patient spent in school (in years)<br>
Gender: 1=male, 2=female<br>
Breastfeeding: 1=yes, 2=no, 3=unknown<br>
Varicella: 1=positive, 2=negative, 3=unknown<br>
Initial_Symptoms: 
> 1=visual<br>
> 2=sensory<br>
> 3=motor<br>
> 4=other<br>
> 5= visual and sensory<br>
> 6=visual and motor<br>
> 7=visual and others<br>
> 8=sensory and motor<br>
> 9=sensory and other<br>
> 10=motor and other<br>
> 11=Visual, sensory and motor<br>
> 12=visual, sensory and other<br>
> 13=Visual, motor and other<br>
> 14=Sensory, motor and other<br>
> 15=visual,sensory,motor and other<br>

Mono _or_Polysymptomatic: 1=monosymptomatic, 2=polysymptomatic, 3=unknown<br>
Oligoclonal_Bands: 0=negative, 1=positive, 2=unknown<br>
LLSSEP: 0=negative, 1=positive<br>
ULSSEP:0=negative, 1=positive<br>
VEP:0=negative, 1=positive<br>
BAEP: 0=negative, 1=positive<br>
Periventricular_MRI:0=negative, 1=positive<br>
Cortical_MRI: 0=negative, 1=positive<br>
Infratentorial_MRI:0=negative, 1=positive<br>
Spinal_Cord_MRI: 0=negative, 1=positive<br>
initial_EDSS:?<br>
final_EDSS:?<br>
> Group: 1=CDMS, 2=non-CDMS


In [15]:
df.tail()

Unnamed: 0.1,Unnamed: 0,Gender,Age,Schooling,Breastfeeding,Varicella,Initial_Symptom,Mono_or_Polysymptomatic,Oligoclonal_Bands,LLSSEP,ULSSEP,VEP,BAEP,Periventricular_MRI,Cortical_MRI,Infratentorial_MRI,Spinal_Cord_MRI,Initial_EDSS,Final_EDSS,group
268,268,2,31,8.0,3,1,9.0,2,0,0,0,0,0,0,0,0,0,,,2
269,269,1,21,15.0,3,3,5.0,2,1,0,0,0,0,0,0,0,1,,,2
270,270,2,19,12.0,3,3,13.0,2,0,1,1,1,0,0,0,0,1,,,2
271,271,2,32,15.0,3,3,15.0,2,1,1,1,1,0,1,1,1,0,,,2
272,272,2,77,6.0,3,3,2.0,1,0,0,1,0,0,0,0,0,0,,,2


In [12]:
# display all rows
pd.set_option('display.max_rows', None)

In [13]:
df

Unnamed: 0.1,Unnamed: 0,Gender,Age,Schooling,Breastfeeding,Varicella,Initial_Symptom,Mono_or_Polysymptomatic,Oligoclonal_Bands,LLSSEP,ULSSEP,VEP,BAEP,Periventricular_MRI,Cortical_MRI,Infratentorial_MRI,Spinal_Cord_MRI,Initial_EDSS,Final_EDSS,group
0,0,1,34,20.0,1,1,2.0,1,0,1,1,0,0,0,1,0,1,1.0,1.0,1
1,1,1,61,25.0,3,2,10.0,2,1,1,0,1,0,0,0,0,1,2.0,2.0,1
2,2,1,22,20.0,3,1,3.0,1,1,0,0,0,0,0,1,0,0,1.0,1.0,1
3,3,2,41,15.0,1,1,7.0,2,1,0,1,1,0,1,1,0,0,1.0,1.0,1
4,4,2,34,20.0,2,1,6.0,2,0,1,0,0,0,1,0,0,0,1.0,1.0,1
5,5,1,29,22.0,1,1,6.0,2,0,1,0,0,0,1,0,1,0,1.0,1.0,1
6,6,2,53,20.0,1,1,14.0,2,0,1,0,1,0,1,1,0,1,1.0,1.0,1
7,7,2,24,15.0,1,1,14.0,2,0,1,1,0,0,1,1,1,1,2.0,2.0,1
8,8,1,36,15.0,1,1,8.0,2,0,1,1,1,0,1,0,0,0,1.0,1.0,1
9,9,2,28,20.0,1,1,8.0,2,0,0,0,0,0,1,0,1,0,1.0,1.0,1


## Data at first glance

> Dimension - 273 rows, 20 columns

> Unnamed: 0 column which is of no use for the analysis