##  Data Loading and Initial Exploration

We begin our analysis by loading the breast cancer classification dataset and inspecting its structure. This dataset contains diagnostic information about tumors, with the goal of building a machine learning model to distinguish between malignant and benign cases.

---

###  Steps in This Notebook

1. **Import required libraries**  
   Use essential Python libraries such as `pandas` for data handling and analysis.

2. **Load the dataset**  
   Read the cancer classification dataset from a `.csv` file.

3. **Explore the dataset**  
   - View the shape and first few rows  
   - Check data types and null values  
   - Review basic statistical metrics  
   - Check for missing values

---

###  Data Loading and Previewing

We begin by importing the dataset using `pandas.read_csv()`. The dataset is located in the `data/raw/` folder within the project directory. Once loaded, we preview the first few rows and overall structure.




In [1]:
import pandas as pd

# Load the dataset
file_path = "C:/Users/sanja/1. Breast_Cancer_Tumor_Classifier/1.Breast_Cancer_Tumor_Classifier/data/raw/cancer_classification.csv"
df = pd.read_csv(file_path)

# Basic structure
print("Shape of the dataset:", df.shape)
print("\nFirst 5 rows:")
print(df.head())

print("\nData Types and Nulls:")
print(df.info())

print("\nStatistical Summary:")
print(df.describe())

print("\nMissing values per column:")
print(df.isnull().sum())


Shape of the dataset: (569, 31)

First 5 rows:
   mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
0        17.99         10.38          122.80     1001.0          0.11840   
1        20.57         17.77          132.90     1326.0          0.08474   
2        19.69         21.25          130.00     1203.0          0.10960   
3        11.42         20.38           77.58      386.1          0.14250   
4        20.29         14.34          135.10     1297.0          0.10030   

   mean compactness  mean concavity  mean concave points  mean symmetry  \
0           0.27760          0.3001              0.14710         0.2419   
1           0.07864          0.0869              0.07017         0.1812   
2           0.15990          0.1974              0.12790         0.2069   
3           0.28390          0.2414              0.10520         0.2597   
4           0.13280          0.1980              0.10430         0.1809   

   mean fractal dimension  ...  worst texture