# Breast Cancer Diagnosis using Machine Learning: A CBIS-DDSM Image Dataset Analysis

## Introduction
Breast cancer is one of the leading causes of death for women globally. Early detection is crucial for successful treatment and recovery. The CBIS-DDSM Breast Cancer Image Dataset is a publicly available dataset that contains mammography images for detecting breast cancer. In this project, we will use machine learning algorithms to build a model that can accurately classify mammography images as benign or malignant.

## Dataset
The CBIS-DDSM Breast Cancer Image Dataset is a comprehensive dataset that contains over 2,500 mammography images. The dataset includes images of both benign and malignant tumors. Each image in the dataset is labeled with its corresponding diagnosis.

In [5]:
# Set up Kaggle API credentials from environment variables
import os
from dotenv import load_dotenv
import kaggle

load_dotenv()
os.environ['KAGGLE_USERNAME'] = os.getenv('KAGGLE_USERNAME')
os.environ['KAGGLE_KEY'] = os.getenv('KAGGLE_KEY')
data_dir = "../data"

# Define the Kaggle datasets to download
dataset_names = ['mukhazarahmad/worldwide-cancer-data']

kaggle.api.authenticate()
for dataset_name in dataset_names:
    kaggle.api.dataset_download_files(dataset_name, path=data_dir, unzip=True)

## Data Preprocessing
Before building our model, we need to preprocess the data. This includes loading the data, resizing the images, and splitting the dataset into training and testing sets. We can use the Python imaging library (PIL) to load and resize the images.

## Building the Model
To build our machine learning model, we will use a convolutional neural network (CNN). CNNs are particularly well-suited for image classification tasks. We will use the Keras library to build and train our CNN.

## Evaluation
Once our model is trained, we will evaluate its performance using the testing set. We will measure the accuracy, precision, recall, and F1 score of our model. We will also generate a confusion matrix to visualize the performance of our model.


## Conclusion
In this project, we used the CBIS-DDSM Breast Cancer Image Dataset to build a machine learning model that can accurately classify mammography images as benign or malignant. Our model can be used as a tool to assist radiologists in detecting breast cancer early, ultimately leading to better patient outcomes.