In [1]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Indonesia Heart Attack Prediction\n",
    "## Notebook 5: Feature Engineering\n",
    "\n",
    "---\n",
    "\n",
    "### Tahap 5 dari Data Science Life Cycle\n",
    "\n",
    "Pada tahap ini, kita akan:\n",
    "1. Create new features dari existing features\n",
    "2. Transform features untuk improve model performance\n",
    "3. Select important features\n",
    "4. Encode categorical variables\n",
    "5. Scale numerical features\n",
    "6. Prepare final dataset untuk modeling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Import Libraries dan Load Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Data manipulation\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# Visualization\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "# Feature engineering and selection\n",
    "from sklearn.preprocessing import LabelEncoder, StandardScaler\n",
    "from sklearn.feature_selection import SelectKBest, chi2, f_classif, mutual_info_classif\n",
    "\n",
    "# System utilities\n",
    "import sys\n",
    "sys.path.append('../src')\n",
    "\n",
    "# Import custom modules\n",
    "from feature_engineering import FeatureEngineer, get_high_correlation_features, feature_summary\n",
    "from data_preprocessing import DataPreprocessor\n",
    "\n",
    "# Settings\n",
    "pd.set_option('display.max_columns', None)\n",
    "plt.style.use('seaborn-v0_8-whitegrid')\n",
    "sns.set_palette('husl')\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "print(\"Libraries imported successfully!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load cleaned data\n",
    "df = pd.read_csv('../data/heart_attack_data_cleaned.csv')\n",
    "\n",
    "print(f\"Original dataset shape: {df.shape}\")\n",
    "print(f\"Features: {df.shape[1]}\")\n",
    "\n",
    "# Initialize feature engineer\n",
    "fe = FeatureEngineer()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Create New Features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Age-based Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Creating age-based features...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "# Age groups\n",
    "df_engineered = fe.create_age_groups(df)\n",
    "\n",
    "print(\"\\nAge groups created:\")\n",
    "print(df_engineered['age_group'].value_counts().sort_index())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Blood Pressure Categories"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Creating blood pressure categories...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "df_engineered = fe.create_blood_pressure_category(df_engineered)\n",
    "\n",
    "print(\"\\nBlood pressure categories:\")\n",
    "print(df_engineered['bp_category'].value_counts())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 Cholesterol Categories"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Creating cholesterol categories...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "df_engineered = fe.create_cholesterol_category(df_engineered)\n",
    "\n",
    "print(\"\\nCholesterol categories:\")\n",
    "print(df_engineered['cholesterol_category'].value_counts())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 BMI Category"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Creating BMI categories...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "df_engineered = fe.create_bmi_category(df_engineered)\n",
    "\n",
    "print(\"\\nBMI categories:\")\n",
    "print(df_engineered['bmi_category'].value_counts())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5 Risk Score (Composite Feature)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Creating composite risk score...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "df_engineered = fe.create_risk_score(df_engineered)\n",
    "\n",
    "print(\"\\nRisk score statistics:\")\n",
    "print(df_engineered['risk_score'].describe())\n",
    "\n",
    "# Visualize risk score distribution\n",
    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
    "\n",
    "# Distribution\n",
    "df_engineered['risk_score'].hist(bins=15, ax=axes[0], color='steelblue', edgecolor='black')\n",
    "axes[0].set_title('Risk Score Distribution', fontsize=12, fontweight='bold')\n",
    "axes[0].set_xlabel('Risk Score')\n",
    "axes[0].set_ylabel('Frequency')\n",
    "axes[0].axvline(df_engineered['risk_score'].mean(), color='red', linestyle='--',\n",
    "               label=f'Mean: {df_engineered[\"risk_score\"].mean():.2f}')\n",
    "axes[0].legend()\n",
    "axes[0].grid(alpha=0.3)\n",
    "\n",
    "# Risk score vs heart attack\n",
    "df_engineered.boxplot(column='risk_score', by='heart_attack', ax=axes[1])\n",
    "axes[1].set_title('Risk Score by Heart Attack Status')\n",
    "axes[1].set_xlabel('Heart Attack')\n",
    "axes[1].set_ylabel('Risk Score')\n",
    "axes[1].set_xticklabels(['No', 'Yes'])\n",
    "plt.sca(axes[1])\n",
    "plt.xticks([1, 2], ['No', 'Yes'])\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "# Calculate mean risk score by heart attack status\n",
    "print(\"\\nMean risk score by heart attack status:\")\n",
    "print(df_engineered.groupby('heart_attack')['risk_score'].mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.6 Interaction Features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Creating interaction features...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "df_engineered = fe.create_interaction_features(df_engineered)\n",
    "\n",
    "print(\"\\nNew interaction features created:\")\n",
    "interaction_features = ['age_hypertension', 'age_diabetes', 'cholesterol_age', \n",
    "                       'bp_interaction', 'total_health_conditions']\n",
    "for feat in interaction_features:\n",
    "    if feat in df_engineered.columns:\n",
    "        print(f\"  - {feat}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Feature Engineering Summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Feature Engineering Summary:\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "summary = feature_summary(df, df_engineered)\n",
    "\n",
    "print(f\"\\nOriginal features: {summary['original_features']}\")\n",
    "print(f\"Engineered features: {summary['engineered_features']}\")\n",
    "print(f\"New features created: {summary['new_features_count']}\")\n",
    "\n",
    "print(f\"\\nList of new features:\")\n",
    "for i, feat in enumerate(summary['new_features_list'], 1):\n",
    "    print(f\"  {i}. {feat}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Encode Categorical Variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Encoding categorical variables...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "# Identify categorical columns\n",
    "categorical_cols = df_engineered.select_dtypes(include=['object']).columns.tolist()\n",
    "\n",
    "print(f\"\\nCategorical columns to encode: {len(categorical_cols)}\")\n",
    "for col in categorical_cols:\n",
    "    print(f\"  - {col}\")\n",
    "\n",
    "# Initialize preprocessor for encoding\n",
    "preprocessor = DataPreprocessor()\n",
    "\n",
    "# Encode categorical variables\n",
    "df_encoded = preprocessor.encode_categorical(df_engineered, categorical_cols)\n",
    "\n",
    "print(\"\\n✓ Categorical variables encoded successfully!\")\n",
    "print(f\"\\nDataset shape after encoding: {df_encoded.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Feature Correlation Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Analyzing feature correlations with target...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "# Calculate correlation with heart_attack\n",
    "correlations = df_encoded.corr()['heart_attack'].drop('heart_attack').sort_values(ascending=False)\n",
    "\n",
    "print(\"\\nTop 20 features correlated with Heart Attack:\")\n",
    "print(correlations.head(20))\n",
    "\n",
    "# Visualize top correlations\n",
    "plt.figure(figsize=(12, 10))\n",
    "top_n = 20\n",
    "top_corr = correlations.head(top_n)\n",
    "colors = ['green' if x > 0 else 'red' for x in top_corr]\n",
    "\n",
    "plt.barh(range(len(top_corr)), top_corr.values, color=colors)\n",
    "plt.yticks(range(len(top_corr)), top_corr.index)\n",
    "plt.xlabel('Correlation Coefficient', fontsize=12)\n",
    "plt.title(f'Top {top_n} Features Correlated with Heart Attack', fontsize=14, fontweight='bold')\n",
    "plt.axvline(x=0, color='black', linestyle='--', linewidth=0.8)\n",
    "plt.grid(axis='x', alpha=0.3)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Feature Selection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6.1 Correlation-based Feature Selection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Correlation-based Feature Selection:\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "# Select features with correlation > threshold\n",
    "threshold = 0.05\n",
    "selected_features_corr = fe.select_features_correlation(df_encoded, 'heart_attack', threshold=threshold)\n",
    "\n",
    "print(f\"\\nFeatures selected (correlation > {threshold}): {len(selected_features_corr)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6.2 Univariate Feature Selection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Univariate Feature Selection:\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "# Prepare X and y\n",
    "X = df_encoded.drop('heart_attack', axis=1)\n",
    "y = df_encoded['heart_attack']\n",
    "\n",
    "# Select top K features using f_classif\n",
    "k = 20\n",
    "selected_features_univariate, selector = fe.select_features_univariate(X, y, k=k, score_func='f_classif')\n",
    "\n",
    "print(f\"\\nTop {k} features selected.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6.3 Check Multicollinearity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Checking for multicollinearity...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "# Find highly correlated features\n",
    "high_corr_pairs = get_high_correlation_features(df_encoded, threshold=0.8)\n",
    "\n",
    "if len(high_corr_pairs) > 0:\n",
    "    print(f\"\\nFound {len(high_corr_pairs)} pairs of highly correlated features (>0.8):\")\n",
    "    for pair in high_corr_pairs[:10]:  # Show first 10\n",
    "        print(f\"  {pair[0]} <-> {pair[1]}: {pair[2]:.3f}\")\n",
    "    \n",
    "    print(\"\\n⚠️  Consider removing one feature from each highly correlated pair.\")\n",
    "else:\n",
    "    print(\"\\n✓ No high multicollinearity detected.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Prepare Final Feature Set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Preparing final feature set...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "# Option 1: Use all features\n",
    "features_all = X.columns.tolist()\n",
    "\n",
    "# Option 2: Use selected features (union of correlation and univariate)\n",
    "features_selected = list(set(selected_features_corr) | set(selected_features_univariate))\n",
    "\n",
    "# Option 3: Use top features from univariate selection\n",
    "features_top = selected_features_univariate\n",
    "\n",
    "print(f\"\\nFeature set options:\")\n",
    "print(f\"  1. All features: {len(features_all)} features\")\n",
    "print(f\"  2. Selected features (union): {len(features_selected)} features\")\n",
    "print(f\"  3. Top univariate features: {len(features_top)} features\")\n",
    "\n",
    "# We'll use Option 1 (all features) and let the model do feature selection\n",
    "final_features = features_all\n",
    "\n",
    "print(f\"\\n✓ Using all features for modeling: {len(final_features)} features\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Train-Test Split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Splitting data into train and test sets...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "X_final = df_encoded[final_features]\n",
    "y_final = df_encoded['heart_attack']\n",
    "\n",
    "X_train, X_test, y_train, y_test = preprocessor.prepare_data_for_modeling(\n",
    "    df_encoded[final_features + ['heart_attack']], \n",
    "    target_column='heart_attack',\n",
    "    test_size=0.2,\n",
    "    random_state=42\n",
    ")\n",
    "\n",
    "print(f\"\\nTrain set: {X_train.shape}\")\n",
    "print(f\"Test set: {X_test.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Feature Scaling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Scaling features...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "# Scale features\n",
    "X_train_scaled, X_test_scaled = preprocessor.scale_features(X_train, X_test)\n",
    "\n",
    "# Convert back to DataFrame for easier handling\n",
    "X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns, index=X_train.index)\n",
    "X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns, index=X_test.index)\n",
    "\n",
    "print(\"\\n✓ Features scaled successfully!\")\n",
    "print(f\"\\nScaled train set: {X_train_scaled.shape}\")\n",
    "print(f\"Scaled test set: {X_test_scaled.shape}\")\n",
    "\n",
    "# Check scaling\n",
    "print(\"\\nScaled data statistics (sample features):\")\n",
    "print(X_train_scaled[['age', 'cholesterol_level', 'blood_pressure_systolic']].describe())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Save Processed Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Saving processed datasets...\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "# Save engineered features dataset\n",
    "df_encoded.to_csv('../data/heart_attack_data_engineered.csv', index=False)\n",
    "print(\"✓ Engineered dataset saved: heart_attack_data_engineered.csv\")\n",
    "\n",
    "# Save train/test splits\n",
    "X_train_scaled.to_csv('../data/X_train_scaled.csv', index=False)\n",
    "X_test_scaled.to_csv('../data/X_test_scaled.csv', index=False)\n",
    "y_train.to_csv('../data/y_train.csv', index=False)\n",
    "y_test.to_csv('../data/y_test.csv', index=False)\n",
    "\n",
    "print(\"✓ Train/test splits saved:\")\n",
    "print(\"  - X_train_scaled.csv\")\n",
    "print(\"  - X_test_scaled.csv\")\n",
    "print(\"  - y_train.csv\")\n",
    "print(\"  - y_test.csv\")\n",
    "\n",
    "# Save feature names\n",
    "pd.DataFrame({'feature': final_features}).to_csv('../data/feature_names.csv', index=False)\n",
    "print(\"✓ Feature names saved: feature_names.csv\")\n",
    "\n",
    "# Save preprocessor objects\n",
    "import joblib\n",
    "joblib.dump(preprocessor.scaler, '../models/scaler.pkl')\n",
    "joblib.dump(preprocessor.label_encoders, '../models/label_encoders.pkl')\n",
    "print(\"\\n✓ Preprocessor objects saved:\")\n",
    "print(\"  - scaler.pkl\")\n",
    "print(\"  - label_encoders.pkl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 11. Feature Engineering Report"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": None,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\n\" + \"=\"*60)\n",
    "print(\"FEATURE ENGINEERING SUMMARY REPORT\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "print(f\"\\n1. DATASET TRANSFORMATION:\")\n",
    "print(f\"   Original features: {df.shape[1]}\")\n",
    "print(f\"   Engineered features: {df_encoded.shape[1]}\")\n",
    "print(f\"   New features created: {df_encoded.shape[1] - df.shape[1]}\")\n",
    "\n",
    "print(f\"\\n2. NEW FEATURES CREATED:\")\n",
    "new_features = set(df_encoded.columns) - set(df.columns)\n",
    "for i, feat in enumerate(sorted(new_features), 1):\n",
    "    print(f\"   {i:2d}. {feat}\")\n",
    "\n",
    "print(f\"\\n3. CATEGORICAL ENCODING:\")\n",
    "print(f\"   Encoded columns: {len(categorical_cols)}\")\n",
    "print(f\"   Encoding method: Label Encoding\")\n",
    "\n",
    "print(f\"\\n4. FEATURE SELECTION:\")\n",
    "print(f\"   Features by correlation (>{threshold}): {len(selected_features_corr)}\")\n",
    "print(f\"   Features by univariate test (top {k}): {len(selected_features_univariate)}\")\n",
    "print(f\"   Final features for modeling: {len(final_features)}\")\n",
    "\n",
    "print(f\"\\n5. DATA SPLIT:\")\n",
    "print(f\"   Training set: {X_train.shape[0]} samples ({X_train.shape[0]/len(df_encoded)*100:.1f}%)\")\n",
    "print(f\"   Test set: {X_test.shape[0]} samples ({X_test.shape[0]/len(df_encoded)*100:.1f}%)\")\n",
    "\n",
    "print(f\"\\n6. FEATURE SCALING:\")\n",
    "print(f\"   Scaling method: StandardScaler (z-score normalization)\")\n",
    "print(f\"   Scaled features: All numerical features\")\n",
    "\n",
    "print(f\"\\n7. TOP FEATURES (by correlation with target):\")\n",
    "top_5 = correlations.head(5)\n",
    "for i, (feat, corr) in enumerate(top_5.items(), 1):\n",
    "    print(f\"   {i}. {feat:30s}: {corr:+.4f}\")\n",
    "\n",
    "print(f\"\\n8. FILES SAVED:\")\n",
    "print(f\"   - heart_attack_data_engineered.csv (Full dataset with engineered features)\")\n",
    "print(f\"   - X_train_scaled.csv, X_test_scaled.csv (Scaled features)\")\n",
    "print(f\"   - y_train.csv, y_test.csv (Target variables)\")\n",
    "print(f\"   - feature_names.csv (List of features)\")\n",
    "print(f\"   - scaler.pkl, label_encoders.pkl (Preprocessing objects)\")\n",
    "\n",
    "print(\"\\n\" + \"=\"*60)\n",
    "print(\"✓ Feature Engineering Completed Successfully!\")\n",
    "print(\"=\"*60)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "Pada tahap Feature Engineering ini, kita telah:\n",
    "\n",
    "1. ✅ **Created New Features**:\n",
    "   - Age groups (categorical)\n",
    "   - Blood pressure categories\n",
    "   - Cholesterol categories\n",
    "   - BMI categories\n",
    "   - Composite risk score\n",
    "   - Interaction features (age × conditions, BP interactions, etc.)\n",
    "\n",
    "2. ✅ **Encoded Categorical Variables**:\n",
    "   - Label encoding untuk semua categorical features\n",
    "   - Preserved label encoders untuk future use\n",
    "\n",
    "3. ✅ **Feature Selection**:\n",
    "   - Correlation-based selection\n",
    "   - Univariate statistical tests\n",
    "   - Multicollinearity check\n",
    "\n",
    "4. ✅ **Data Preparation**:\n",
    "   - Train-test split (80-20)\n",
    "   - Feature scaling (StandardScaler)\n",
    "   - Stratified sampling untuk balanced classes\n",
    "\n",
    "5. ✅ **Saved Artifacts**:\n",
    "   - Processed datasets\n",
    "   - Preprocessing objects\n",
    "   - Feature names\n",
    "\n",
    "### Key Achievements:\n",
    "- Increased feature space dengan meaningful engineered features\n",
    "- Maintained interpretability of features\n",
    "- Prepared clean, scaled data ready for modeling\n",
    "- Created reusable preprocessing pipeline\n",
    "\n",
    "### Feature Quality:\n",
    "- All features properly encoded\n",
    "- No missing values\n",
    "- Scaled for consistent ranges\n",
    "- Low multicollinearity\n",
    "\n",
    "### Next Steps:\n",
    "Lanjut ke **Notebook 6: Predictive Modeling** untuk train machine learning models.\n",
    "\n",
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

NameError: name 'null' is not defined