# Exploratory Data Analysis Report

This notebook performs an automated EDA on the `data.csv` file. All outputs are generated automatically via GitHub Actions.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style
sns.set_theme(style="whitegrid")

## 1. Load and Inspect Data

In [None]:
CSV_COLUMNS = [
    'Name', 'Address', 'PostalCode', 'City', 'Canton', 'Phone', 
    'Email', 'Website', 'Specialty', 'Source'
]
df = pd.read_csv('data.csv', names=CSV_COLUMNS, header=0)

print("First 5 rows:")
display(df.head())

## 2. Basic Information & Data Quality

In [None]:
print(f"Dataset Shape: {df.shape[0]} rows and {df.shape[1]} columns.")
print("\nData Types:")
display(df.info())

In [None]:
print("Missing Values per Column:")
display(df.isnull().sum())

In [None]:
print(f"Number of duplicate rows: {df.duplicated().sum()}")

## 3. Univariate Analysis

### Top 10 Cantons

In [None]:
top_cantons = df['Canton'].value_counts().nlargest(10)
print(top_cantons)

plt.figure(figsize=(12, 6))
sns.barplot(x=top_cantons.index, y=top_cantons.values, palette='viridis')
plt.title('Top 10 Cantons by Number of Entries')
plt.xlabel('Canton')
plt.ylabel('Count')
plt.show()

### Top 10 Specialties

In [None]:
top_specialties = df['Specialty'].value_counts().nlargest(10)
print(top_specialties)

plt.figure(figsize=(12, 8))
sns.barplot(x=top_specialties.values, y=top_specialties.index, palette='plasma', orient='h')
plt.title('Top 10 Specialties by Number of Entries')
plt.xlabel('Count')
plt.ylabel('Specialty')
plt.show()

## 4. EDA Summary

The analysis is complete. This notebook provides a snapshot of the data's structure, quality, and key distributions.