# Data Collection - Parkinson's Disease Detection

This notebook downloads the UCI Parkinson's Dataset for voice-based disease detection.

**Dataset**: Oxford Parkinson's Disease Detection Dataset  
**Source**: UCI Machine Learning Repository  
**Files**: parkinsons.data, parkinsons.names


In [1]:
import pandas as pd
import os

# Create data directory if it doesn't exist
os.makedirs('../data', exist_ok=True)
print("Data directory ready!")


Data directory ready!


## Download Dataset

Downloading the Parkinson's dataset files from UCI repository.


In [2]:
# Download parkinsons.data
!wget -O ../data/parkinsons.data https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data

# Download parkinsons.names  
!wget -O ../data/parkinsons.names https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.names

print("Dataset downloaded successfully!")


--2025-06-17 21:57:32--  https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘../data/parkinsons.data’

../data/parkinsons.     [  <=>               ]  39.74K   111KB/s    in 0.4s    

2025-06-17 21:57:33 (111 KB/s) - ‘../data/parkinsons.data’ saved [40697]

--2025-06-17 21:57:34--  https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.names
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘../data/parkinsons.names’

../data/parkinsons.     [ <=>                ]   3.01K  --.-KB/s    in 0s      

2025-06-17 2

## Data Verification

Quick verification of downloaded files using head() command.


In [3]:
# Load and verify the main dataset
df = pd.read_csv('../data/parkinsons.data')

print("Dataset shape:", df.shape)
print("\nFirst 5 rows:")
print(df.head())

print(f"\nDataset contains {len(df)} voice recordings")
print(f"Number of features: {len(df.columns)}")
print(f"Target variable 'status': {df['status'].value_counts().to_dict()}")


Dataset shape: (195, 24)

First 5 rows:
             name  MDVP:Fo(Hz)  MDVP:Fhi(Hz)  MDVP:Flo(Hz)  MDVP:Jitter(%)  \
0  phon_R01_S01_1      119.992       157.302        74.997         0.00784   
1  phon_R01_S01_2      122.400       148.650       113.819         0.00968   
2  phon_R01_S01_3      116.682       131.111       111.555         0.01050   
3  phon_R01_S01_4      116.676       137.871       111.366         0.00997   
4  phon_R01_S01_5      116.014       141.781       110.655         0.01284   

   MDVP:Jitter(Abs)  MDVP:RAP  MDVP:PPQ  Jitter:DDP  MDVP:Shimmer  ...  \
0           0.00007   0.00370   0.00554     0.01109       0.04374  ...   
1           0.00008   0.00465   0.00696     0.01394       0.06134  ...   
2           0.00009   0.00544   0.00781     0.01633       0.05233  ...   
3           0.00009   0.00502   0.00698     0.01505       0.05492  ...   
4           0.00011   0.00655   0.00908     0.01966       0.06425  ...   

   Shimmer:DDA      NHR     HNR  status      R

In [4]:
# Verify the metadata file
print("Dataset metadata (first few lines):")
with open('../data/parkinsons.names', 'r') as f:
    lines = f.readlines()
    for i, line in enumerate(lines[:10]):
        print(f"{i+1}: {line.strip()}")

print(f"\n✓ Data collection completed!")
print(f"✓ Files saved in ../data/ directory")
print(f"✓ Ready for exploratory data analysis")


Dataset metadata (first few lines):
1: Title: Parkinsons Disease Data Set
2: 
3: Abstract: Oxford Parkinson's Disease Detection Dataset
4: 
5: -----------------------------------------------------
6: 
7: Data Set Characteristics: Multivariate
8: Number of Instances: 197
9: Area: Life
10: Attribute Characteristics: Real

✓ Data collection completed!
✓ Files saved in ../data/ directory
✓ Ready for exploratory data analysis
