<a href="https://colab.research.google.com/github/FestuMiles/classification-model-unza-publications/blob/main/Classification_of_unza_faculty_research_interests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1. Business Understanding
##Problem Statement
The University of Zambia (UNZA) produces numerous research publications annually across diverse faculties. However, these publications are not systematically classified according to Zambia’s Vision 2030 sector categories, making it challenging for policymakers, funding bodies, and administrators to assess how research aligns with national development priorities. Manual classification is time-consuming, inconsistent, and inefficient. There is a need for an automated system to classify research outputs into Vision 2030 sectors using only publication titles.

##Business Objectives
The objective is to:
* Automate the classification of UNZA faculty research publications into Vision 2030 sector categories.
* Improve accessibility and searchability of research outputs by sector.
* Enable faster and more consistent reporting for stakeholders.

Success in real-world terms means stakeholders can easily retrieve research outputs relevant to specific Vision 2030 sectors making the classification process faster, more consistent, and requires minimal

##Data Mining Goals
* We will collect, understand the data of UNZA faculty member’s publications
* We prepare the training and testing data which includes include cleaning, transforming, integrating, and formatting data for modeling.
* We will build a supervised machine learning classification model that:
 * Takes a publication title as input.
 *	Predicts the most likely Vision 2030 sector category (e.g., Agriculture, Energy, Health, Education, etc.).
 * Is trained and evaluated using labeled publication title data from UNZA faculty research.
* We will assesses the quality and effectiveness of the model ensuring the results achieve the business objectives
##Initial Project Success Criteria
* **Model Performance**: The classification model should achieve at least 80% accuracy on unseen test data.
* **Usability**: The system should output results in a clear and interpretable format for non-technical users.
* **Practical Value**: The automated classification should reduce the manual categorization time by at least 50% compared to the current process.




# Data Understanding

In [None]:
import pandas as pd

In [None]:
#Mounting to the drive so that created files are saved
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#Navigating to the shared team project folder
#instructions: ensure you have created a shortcut of the shared folder in the drive
#root directory
%cd "/content/drive/My Drive/misc-unza25-csc4792-project_team21"

/content/drive/.shortcut-targets-by-id/1rE8kSkQXl-SiU24RkWcyCi02p-dW0Y-1/misc-unza25-csc4792-project_team21


In [None]:
!ls

Classification_of_unza_faculty_research_interests.ipynb
unza_faculty_publications_details.csv


In [None]:
pub_details = pd.read_csv('unza_faculty_publications_details.csv')

In [None]:
pub_details.head()

Unnamed: 0,title,authors,year,venue,abstract,num_citations,url_scholarbib
0,Zambezi voice: A multilingual speech corpus fo...,"C Sikasote, K Siaminwe, S Mwape, B Zulu",2023.0,arXiv preprint arXiv …,for all the seven official native languages of...,5,https://arxiv.org/abs/2306.04428
1,BembaSpeech: A speech recognition corpus for t...,"C Sikasote, A Anastasopoulos",2021.0,arXiv preprint arXiv:2102.04889,impendwa ya bantu ba mu Zambia ukufika cipendo...,20,https://arxiv.org/abs/2102.04889
2,Big-c: a multimodal multi-purpose dataset for ...,"C Sikasote, E Mukonde, MMI Alam",2023.0,arXiv preprint arXiv …,We present BIG-C (Bemba Image Grounded Convers...,6,https://arxiv.org/abs/2305.17202
3,Evaluating DICOM Compliance for Medical Images...,"E Chileshe, MC Sikasote, L Phiri",,,This paper focuses on a detailed examination o...,0,https://datalab.unza.zm/sites/default/files/20...
4,Implementation of the Sustainable Development ...,"J Lubbungu, C Pailet, K Shameenda",,… OF PUBLIC UNIVERSITIES …,"of entrepreneurial universities in the world, ...",0,https://www.zapuc.edu.zm/docs/2018_ZAPUC_Confe...
