In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Enhanced Cyber Attacks Data Analysis\n",
    "\n",
    "This notebook analyzes cyber attack incident data, with unique insights and visualizations."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Install plotly if not already\n",
    "!pip install plotly"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import plotly.express as px\n",
    "from datetime import datetime"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Loading and Feature Engineering"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "df = pd.read_csv('all_recs.csv', encoding='unicode_escape')\n",
    "\n",
    "# Add a 'Month' column for time-based analysis\n",
    "df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')\n",
    "df['Month'] = df['Date'].dt.to_period('M')\n",
    "df['Year'] = df['Date'].dt.year\n",
    "\n",
    "# Clean whitespace\n",
    "df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Overview"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "display(df.head())\n",
    "df.info()\n",
    "df.describe(include='all')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Monthly Attack Trends (Line Plot)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "monthly_counts = df.groupby('Month').size().reset_index(name='Attacks')\n",
    "\n",
    "fig = px.line(monthly_counts, x='Month', y='Attacks', title='Monthly Cyber Attack Trend', markers=True)\n",
    "fig.update_xaxes(rangeslider_visible=True)\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Insight:** Peaks and valleys show periods of increased cyber threat activity. Try to correlate peaks with global events."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Top 10 Most Targeted Victims"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "top_victims = df['Victim'].value_counts().head(10)\n",
    "fig = px.bar(top_victims, x=top_victims.values, y=top_victims.index, orientation='h',\n",
    "             title='Top 10 Most Targeted Victims', labels={'x':'Attack Count','y':'Victim'})\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Attack Motives (Pie Chart)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "motive_counts = df['Motive'].value_counts()\n",
    "fig = px.pie(values=motive_counts.values, names=motive_counts.index, title='Attack Motives (All Time)')\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Attacker vs Victim Location Heatmap"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Only keep top 10 locations for each to keep the plot readable\n",
    "top_actor_locs = df['Actor Location'].value_counts().head(10).index\n",
    "top_victim_locs = df['Location'].value_counts().head(10).index\n",
    "heatmap_df = df[df['Actor Location'].isin(top_actor_locs) & df['Location'].isin(top_victim_locs)]\n",
    "\n",
    "pivot = pd.pivot_table(heatmap_df, index='Actor Location', columns='Location', values='Date', aggfunc='count', fill_value=0)\n",
    "plt.figure(figsize=(10,8))\n",
    "sns.heatmap(pivot, annot=True, fmt='d', cmap='YlGnBu')\n",
    "plt.title('Attack Flows: Top Attacker to Victim Locations')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Insight:** This heatmap shows which countries are attacking which the most."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Attack Type Over the Years (Stacked Area)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "type_year = df.groupby(['Year', 'Type']).size().reset_index(name='Count')\n",
    "pivot_type = type_year.pivot(index='Year', columns='Type', values='Count').fillna(0)\n",
    "pivot_type.plot(kind='area', stacked=True, figsize=(12,6), colormap='tab20')\n",
    "plt.title('Attack Types Over the Years')\n",
    "plt.ylabel('Number of Attacks')\n",
    "plt.xlabel('Year')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Insight:** Some attack types are becoming more or less prevalent over time."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Correlation Heatmap for Numeric Columns"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Add a simple numeric feature: victim name length (for demo)\n",
    "df['Victim_Length'] = df['Victim'].apply(lambda x: len(str(x)))\n",
    "numeric_df = df[['Year','Victim_Length']].dropna().copy()\n",
    "numeric_df['Year'] = pd.to_numeric(numeric_df['Year'], errors='coerce')\n",
    "corr = numeric_df.corr()\n",
    "sns.heatmap(corr, annot=True, cmap='Blues')\n",
    "plt.title('Correlation Heatmap (example)')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Modular Functions Example"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "def plot_top_categories(df, column, n=10, title=''):\n",
    "    counts = df[column].value_counts().head(n)\n",
    "    fig = px.bar(y=counts.index, x=counts.values, orientation='h', title=title)\n",
    "    fig.show()\n",
    "\n",
    "# Example usage:\n",
    "plot_top_categories(df, 'Industry', n=8, title='Top 8 Affected Industries')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summary & Next Steps\n",
    "\n",
    "- The dataset shows temporal trends, key victims, and attack flows between countries.\n",
    "- Top industries and motives are visualized.\n",
    "- Next: try clustering, prediction, or export insights to a report."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}