A machine learning project that predicts lobbying success using Canadian lobbying registrations data. The model determines whether a lobbying attempt will receive government funding based on features such as subject matter, registrant location, organizational structure, and target institution.
| Metric | Score |
|---|---|
| Accuracy | 71.5% |
| Precision | 70.3% |
| Recall | 71.7% |
| F1 Score | 70.5% |
When predicting government funding success (Class = 1), the model identified these top predictive features:
- Subject Matter: Budget (8.0%), Economic Development (7.8%), Infrastructure (5.9%)
- Target Institution: Natural Resources Canada (44.8%), Finance Canada (25.1%), Agriculture Canada (11.1%)
- Region (Area Code): Ottawa-613 (45.8%), Toronto-416 (24.5%)
- Data Preprocessing: Handling missing values with imputation, encoding categorical features, feature scaling with StandardScaler
- Class Imbalance Handling: RandomUnderSampler to address 33/67 class distribution
- Multiple Models: Logistic Regression, Random Forest, Gradient Boosting, XGBoost
- Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, Confusion Matrix, ROC Curve, Lift Chart, Cumulative Gains
LobbyingPredictionModel/
├── lobbyingModel.ipynb # Main Jupyter notebook with analysis
├── canadacities.csv # Canadian cities reference data
├── images/ # Visualization outputs
│ ├── confusion_matrix.png
│ ├── roc_curve.png
│ └── cumulative_gains_lift.png
├── requirements.txt # Python dependencies
├── LICENSE # MIT License
└── README.md # This file
The primary dataset merged lobby.csv (~995K rows) is not included due to file size.
To obtain the data:
- Visit the Office of the Commissioner of Lobbying of Canada
- Download the lobbying registration exports
- Merge the Primary Export and Subject Matters Export files
- Save as
merged lobby.csvin the project root
| Column | Description |
|---|---|
GOVT_FUND_IND_FIN_GOUV |
Target variable: Government funding indicator (Y/N) |
SUBJ_MATTER_OBJET |
Subject matter of lobbying activity |
RGSTRNT_ADDRESS_ADRESSE_DCLRNT |
Registrant address (processed to city) |
RGSTRNT_TEL_DCLRNT |
Registrant phone (area code used as region proxy) |
PARENT_IND_SOC_MERE |
Parent company indicator |
COALITION_IND |
Coalition indicator |
SUBSIDIARY_IND_FILIALE |
Subsidiary indicator |
DIRECT_INT_IND_INT_DIRECT |
Direct interest indicator |
INSTITUTION |
Target government institution |
- Clone the repository:
git clone https://github.com/davidyang02/LobbyingPredictionModel.git
cd LobbyingPredictionModel- Install dependencies:
pip install -r requirements.txt-
Obtain the primary dataset (see Data section above)
-
Run the Jupyter notebook:
jupyter notebook lobbyingModel.ipynb| Model | Description |
|---|---|
| Logistic Regression | Baseline linear model |
| Random Forest Classifier | Ensemble learning with decision trees (best performer) |
| Gradient Boosting Classifier | Boosting-based decision tree model |
| XGBoost Classifier | Optimized gradient boosting |
This project is open-source and available under the MIT License.


