# Automated Data Visualisation with Generative AI
This project is designed to generate visualisations automatically for any uploaded dataset. The tool identifies numerical and categorical columns, then creates visualisations—such as scatter plots, histograms, and box plots—based on the dataset's structure, making it highly adaptable and user-friendly.

Step 1: Importing Libraries
To build this project, we use several libraries:

* **Pandas:** for data handling
* **Matplotlib:** for basic visualisations
* **Seaborn:** for sample datasets
* **Plotly:** for interactive visualisations
* **ipywidgets:** for creating user input widgets

If **Plotly** is not already installed, run the following code to install it:

In [None]:
# Install Plotly if not already installed
!pip install plotly

# Import all necessary libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import ipywidgets as widgets
from IPython.display import display

##Step 2: File Upload for Custom Dataset
This section allows users to upload their own dataset, replacing the sample data. Supported file types are .xlsx and .csv. Once uploaded, the visualisations in the following sections will automatically use the new data.

In [None]:
from google.colab import files

# Create an upload button
uploaded = files.upload()

# Load the uploaded file into a DataFrame
if uploaded:
    for filename in uploaded.keys():
        if filename.endswith('.xlsx'):
            df = pd.read_excel(filename)  # Use read_excel for Excel files
            print(f"Loaded '{filename}' successfully as an Excel file!")
        elif filename.endswith('.csv'):
            df = pd.read_csv(filename)  # Use read_csv for CSV files
            print(f"Loaded '{filename}' successfully as a CSV file!")
        else:
            print(f"Unsupported file type: {filename}")
            df = None

        if df is not None:
            display(df.head())  # Show a preview of the data

##Step 3: Column Type Detection and Data Overview
This section identifies numerical and categorical columns in the dataset. These classifications allow us to generate suitable visualisations automatically based on the data’s structure.

In [None]:
# Basic data overview
print("Dataset Shape:", df.shape)
print("Dataset Columns Overview:")
display(df.describe(include='all'))

# Detect numerical and categorical columns
numerical_columns = df.select_dtypes(include=['float64', 'int64']).columns
categorical_columns = df.select_dtypes(include=['object', 'category']).columns

print("Numerical Columns:", numerical_columns)
print("Categorical Columns:", categorical_columns)

##Step 4: Automated Visualisation Generation
This function generates visualisations automatically based on the detected column types. It creates scatter plots, histograms, and box plots if the dataset includes suitable columns.

In [143]:
def auto_visualise(df):
    # Detect numerical and categorical columns
    numerical_columns = df.select_dtypes(include=['float64', 'int64']).columns
    categorical_columns = df.select_dtypes(include=['object', 'category']).columns

    # Scatter plot for two numerical columns (if available)
    if len(numerical_columns) >= 2:
        # Use the first categorical column for color, if available
        color_column = categorical_columns[0] if len(categorical_columns) > 0 else None

        # Generate scatter plot using the first two numerical columns
        fig = px.scatter(df, x=numerical_columns[0], y=numerical_columns[1],
                         title=f'Scatter Plot: {numerical_columns[0]} vs {numerical_columns[1]}',
                         color=color_column)  # Color by the first categorical column if available
        fig.show()
    else:
        print("Scatter plot: Need at least two numerical columns.")

    # Histogram for a single numerical column (if available)
    if len(numerical_columns) >= 1:
        # Generate histogram for the first numerical column
        fig = px.histogram(df, x=numerical_columns[0],
                           title=f'Histogram of {numerical_columns[0]}',
                           color_discrete_sequence=['blue'])  # Set to default blue colour
        fig.show()
    else:
        print("Histogram: No numerical columns available.")

    # Box plot for one numerical and one categorical column (if available)
    if len(numerical_columns) >= 1 and len(categorical_columns) >= 1:
        # Generate box plot with first numerical and first categorical column
        fig = px.box(df, x=categorical_columns[0], y=numerical_columns[0],
                     title=f'Box Plot: {numerical_columns[0]} by {categorical_columns[0]}',
                     color_discrete_sequence=['blue'])  # Set to default blue colour
        fig.show()
    else:
        print("Box plot: Need at least one numerical and one categorical column.")

##Step 5: Suggest Ideal Visualisations
This step integrates a function that evaluates the dataset and suggests the most suitable visualisations based on its characteristics.

In [144]:
def suggest_visualisations(df):
    # Detect numerical and categorical columns
    numerical_columns = df.select_dtypes(include=['float64', 'int64']).columns
    categorical_columns = df.select_dtypes(include=['object', 'category']).columns

    suggestions = []

    # Suggest visualisations based on the presence of numerical and categorical columns
    if len(numerical_columns) >= 2:
        suggestions.append("Scatter Plot: Ideal for exploring relationships between two numerical variables.")

    if len(numerical_columns) >= 1:
        suggestions.append("Histogram: Useful for understanding the distribution of a numerical variable.")

    if len(numerical_columns) >= 1 and len(categorical_columns) >= 1:
        suggestions.append("Box Plot: Great for comparing distributions of a numerical variable across categories.")

    if len(categorical_columns) >= 1:
        suggestions.append("Bar Chart: Effective for visualizing the counts of categorical data.")

    if not suggestions:
        suggestions.append("No suitable visualisations could be suggested based on the dataset structure.")

    return suggestions

##Step 6: Display Suggested Visualisations
Run the following code to get suggestions for the best visualisations based on the current dataset.

In [None]:
# Suggest ideal visualisations based on the dataset
visualisation_suggestions = suggest_visualisations(df)
print("Suggested Visualisations:")
for suggestion in visualisation_suggestions:
    print(f"- {suggestion}")

##Step 7: Execute Auto-Visualisation
Run the auto_visualise function below to create visualisations based on the current dataset.

In [None]:
# Call the auto_visualise function without colour parameters
auto_visualise(df)

##Summary and Next Steps
This project provides a dynamic and interactive data visualisation tool that allows users to upload any dataset and automatically generate insightful visualisations. The tool uses machine learning to suggest the most suitable visualisations based on the characteristics of the dataset, enhancing the user experience by guiding them towards optimal visual insights.

Other features include automatic visualisation generation, making this tool a flexible approach to data exploration and insight generation.

**Next Steps:**
* Customisable Colour Options: Introduce the ability for users to select custom colours for different types of visualisations to enhance personalisation and visual appeal.

* Additional Visualisation Types: Expand the variety of graph types available, such as pie charts, line graphs, and heatmaps, to provide users with more options for data representation.

* Refinement of Suggestion Mechanisms: Explore improvements in the suggestion features to provide even better guidance based on dataset characteristics.

* Complex Data Upload Features: Enable connections to external databases and support a broader array of file types for improved versatility.