# Problem Description
The goal is to hire the best intern from the given dataset. The dataset consists of 14 columns, and we need to evaluate the interns based on their scores in various categories and select the intern with the highest aggregate score.

## Approach
The approach is to assign aggregates to different categories or features and calculate the aggregate score for each intern. The total aggregate score for all categories should sum up to 10.

The relevant columns considered for evaluation are:

- **Python** (Aggregate: 1.5)
- **Machine Learning** (Aggregate: 1.8)
- **Deep Learning** (Aggregate: 1.8)
- **Natural Language Processing (NLP)** (Aggregate: 1.5)
- **Other Skills** (Aggregate: 0.5)
- **Availability** (Aggregate: 0.5)
- **Degree** (Aggregate: 0.5)
- **Stream** (Aggregate: 1.4)
- **Year of Graduation** (Aggregate: 0.5)
- **Study** (Aggregate: 1)

The challenge in the dataset is handling missing values in the **Degree** and **Stream** columns. To address this:

- For the **Degree** column, replace the missing values with "Unknown" to retain valuable data for evaluation.
- For the **Stream** column, replace the missing values with "Unknown" to retain valuable data for evaluation.

### Assigning weights, Values to each column in the dataset(specific values will be assigned based on the relevance to data science and data analytics internships):
- **Python**, **Machine Learning**, **Deep Learning**, and **Natural Language Processing (NLP)** columns are left as they are, on a scale of 0 to 3.

- For each extra skill mentioned in the **Other Skills** column, a value of 0.03 is assigned.

- For **Study Aggregate**:
  - For 10th grade, a value of 0.05 is assigned.
  - For 12th grade, a value of 0.1 is assigned.
  - For UG, a value of 0.3 is assigned.
  - For PG, a value of 0.05 is assigned.
  - The total value being 0.5, the associated weight should be 2 so that the aggregate is 1.


- For **Stream Aggregate**:
    - As studying in a relevant stream is important a aggregate of 1.4 is chosen.
    - stream is given prioroty based on AI /ML /DS  > CSE > IT,IOT > ECE, Mechanical > Metallurgy, Pharmacy
    - based on keywords value and weight are chosen so that the total aggregate is 1.4

- For ** Degree Aggregate**:
    - Degree is given prioroty based on Masters > Bachelor's
    - based on keywords value and weight are chosen so that the aggregate is 0.5


- For **Python Aggregate**:
    - values were choosen directly from given data set (in scale of 0 to 3)
    - weight associated is 0.5
    - Total Aggregate is 1.5

- For **Machine Learning Aggregate**:
    - values were choosen directly from given data set (in scale of 0 to 3)
    - weight associated is 0.5
    - Total Aggregate is 1.8

- For **NLP Aggregate**:
    - values were choosen directly from given data set (in scale of 0 to 3)
    - weight associated is 0.5
    - Total Aggregate is 1.5

- For **Deep Learning Aggreagate**:
    - values were choosen directly from given data set (in scale of 0 to 3)
    - weight associated is 0.5
    - Total Aggregate is 1.8

- For **Availability Aggregate**:
    - if the canditate is available for 3 months internship , immediately as value of 1 is assigned
    - weight associated is 0.5
    - Total Aggregate is 0.5

- For **Year of Graduation Aggregate**:
    - 2024 graduates are more preferred,so value of 3 - (abs(2024) - Current Year of Graduation) which returns a value in scale of (0-3)
    - Thus weight of 0.166 is assigned
    - Total Aggregates is 0.5

- For **Other Skills Aggregate**:
    - Each Extra skill is assigned a value of 

## Scoring and Ranking Metrics
- Assign aggregates to each feature/category based on its importance and relevance to the hiring decision.
- Calculate the score for each intern by multiplying the feature values with their respective aggregates and summing them up.
- Rank the interns based on their aggregate scores.
- Select the intern with the highest aggregate score as the best candidate for hiring.

By following this approach, we can effectively evaluate the interns and identify the best candidate for the internship opportunity.

Please note that these aggregates and weights are adjustable and subjective based on the specific requirements and preferences of the evaluation criteria. Feel free to further customize them as per your specific needs.


In [138]:
# importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Data Preparation:

In [139]:
# Reading CSV File
df = pd.read_csv('/home/itachi/Documents/data.csv')
df

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,,6.50/7,,
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,,8.90/10,,
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,,,,
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),,2024,,,85.60/85.60,10.00/10.00
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,,8.10/10,93.40/93.40,10.00/10.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1131,,2,2,0,2,"Data Analytics, Amazon Web Services (AWS), Dat...","Yes, I am available for 3 months starting imme...",B.Tech,Mechanical Engineering,2021,,,,
1132,,3,3,2,3,"Deep Learning, Docker, HTML, MS-Office, Machin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,,,,
1133,,3,1,3,3,"Data Science, Deep Learning, English Proficien...","Yes, I am available for 3 months starting imme...",B.Tech,Electronics and Communication,2025,,8.77/10,9.40/9.40,
1134,,2,1,0,0,"Python, Data Analytics, MS-Excel, Machine Lear...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2024,,7.90/10,90.00/90.00,


In [140]:
# Checking for data types of items present in each column
result = df.dtypes
print(result)

Name                                                                                                 float64
Python (out of 3)                                                                                      int64
Machine Learning (out of 3)                                                                            int64
Natural Language Processing (NLP) (out of 3)                                                           int64
Deep Learning (out of 3)                                                                               int64
Other skills                                                                                          object
Are you available for 3 months, starting immediately, for a full-time work from home internship?      object
Degree                                                                                                object
Stream                                                                                                object
Current Year Of Gra

In [141]:
# checking how many null values are being persent in each coluumn
df.isnull().sum()

Name                                                                                                 1136
Python (out of 3)                                                                                       0
Machine Learning (out of 3)                                                                             0
Natural Language Processing (NLP) (out of 3)                                                            0
Deep Learning (out of 3)                                                                                0
Other skills                                                                                           66
Are you available for 3 months, starting immediately, for a full-time work from home internship?        0
Degree                                                                                                 43
Stream                                                                                                170
Current Year Of Graduation                    

In [142]:
# Filling Nan values with key words
df['Degree'].fillna('Unknown', inplace = True)
df['Stream'].fillna('Unknown', inplace = True)
df['Other skills'].fillna('No additional skill' , inplace = True)
df['Performance_10'].fillna('0/0',inplace = True)
df['Performance_12'].fillna('0/0',inplace = True)
df['Performance_PG'].fillna('0/0', inplace = True)
df['Performance_UG'].fillna('0/0', inplace = True)
df.isnull().sum()

Name                                                                                                 1136
Python (out of 3)                                                                                       0
Machine Learning (out of 3)                                                                             0
Natural Language Processing (NLP) (out of 3)                                                            0
Deep Learning (out of 3)                                                                                0
Other skills                                                                                            0
Are you available for 3 months, starting immediately, for a full-time work from home internship?        0
Degree                                                                                                  0
Stream                                                                                                  0
Current Year Of Graduation                    

In [143]:
# data frame after filling null values 
df.head(5)

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,0/0,6.50/7,0/0,0/0
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,0/0,8.90/10,0/0,0/0
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,0/0,0/0,0/0,0/0
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),Unknown,2024,0/0,0/0,85.60/85.60,10.00/10.00
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,0/0,8.10/10,93.40/93.40,10.00/10.00


In [144]:
# Converting Other skills to a numbered scale
ans = 1
for item in df['Other skills']:
    ans = max(len(item.split(',')),ans)
df['Other skills_Aggregate'] = df['Other skills'].apply(lambda x: (1/ans)*len(x.split(',')))

In [145]:
# data frame after scaling otherskills to value
df.head(5)

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10,Other skills_Aggregate
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,0/0,6.50/7,0/0,0/0,0.095238
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,0/0,8.90/10,0/0,0/0,0.52381
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,0/0,0/0,0/0,0/0,0.126984
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),Unknown,2024,0/0,0/0,85.60/85.60,10.00/10.00,0.285714
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,0/0,8.10/10,93.40/93.40,10.00/10.00,0.063492


In [146]:
print("--------------- UNIQUE VALUES IN DEGREE COLUMN ------------------")
s = set(df['Degree'].values)
for item in s:
    print(item)
print("NUMBER OF UNIQUE DEGREE'S :",len(s))

--------------- UNIQUE VALUES IN DEGREE COLUMN ------------------
Bachelor of Computer Applications (BCA) (Hons.)
Integrated B.E & MBA
Post Graduate Diploma In Data Analytics And Machine Learning
DIPLOMA IN CIVIL Engineering
Master of Engineering (M.E)
Bachelor of Artificial Intelligence
M.Sc. in Data Science
Msc.Data Analytics
Integrated M.Sc.
Bachelor of Commerce (B.Com)
Integrated B.Tech & M.Sc
M.Sc (Information Technology)
Bachelor of Chemical Engineering
B.sc(chem)
B.A
Post Graduate Programme (PGP)
Integrated B.E & M.Sc
PG Diploma in Data Science
Business Engineering (BE)
B.E Computer Science and Engineering (Artificial Intelligence and machine Learning)
BE
Master of Information Technology (M.I.T.)
Master of Science (M.Sc) (Tech)
Bachelor Of Technology (B.Tech) CS
Bachelor of Arts (B.A.)
Post Graduate Diploma in Management (P.G.D.M.)
Bachelor of Computer Engineering
Integrated MCA
B.Tech
Post Graduate Diploma in Human Resource Management (P.G.D.H.R.M.)
Diploma In Pharmacy
Hsc
Btec

In [147]:
keywords_1 = ['bachelor','b.tech','btech']
keywords_2 = ['master', 'm.tech','mtech']
keywords_3 = ['phd','Ph.d']
keywords_4 = ['ds','artificial','ai','data','computer','statistics']
keywords_5 = ['missing','unknown']

List = []
for item in df['Degree']:
    list = item.split(' ')
    value = 0.1 
    for keyword in list:
        if keyword.lower() in keywords_1:
            value += 0.1
        elif keyword.lower() in keywords_2:
            value += 0.2
        elif keyword.lower() in keywords_3:
            value += 0.3
        elif keyword.lower() in keywords_4:
            value += 0.1
        elif keyword.lower() in keywords_5:
            value -= 0.1
    List.append(value)
df['Degree_value'] = List
df.head(5)

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10,Other skills_Aggregate,Degree_value
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,0/0,6.50/7,0/0,0/0,0.095238,0.2
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,0/0,8.90/10,0/0,0/0,0.52381,0.2
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,0/0,0/0,0/0,0/0,0.126984,0.3
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),Unknown,2024,0/0,0/0,85.60/85.60,10.00/10.00,0.285714,0.2
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,0/0,8.10/10,93.40/93.40,10.00/10.00,0.063492,0.2


In [148]:
print("--------------- UNIQUE VALUES IN STREAM COLUMN ------------------")
s = set(df['Stream'].values)
for item in s:
    print(item)
print("NUMBER OF UNIQUE STREAMS: " ,len(s))

--------------- UNIQUE VALUES IN STREAM COLUMN ------------------
Machine Learning
Post Graduate Diploma In Big Data Analytics
Information science and engineering
Agriculture Management
Cs(aiml)
Food Technology
AI & DATA SCIENCE
Computer  Science  And Engineering
Microbiology
Analytics And Big Data
Commerce
Minor Specialization In AI & ML (Hons.)
Engineering
computer science
Finance
Chemical Engineering
Computer Science & Information Technology
Engineering Physics
Computer Science With Specialization In AI And ML
Software Development
Artificial Intellience & Data Science
Artificial Intelligence And Data Science
Computer Application(Data Science)
Data Analytics
Computer Science & Technology
Computer Science & Engineering ( Machine Learning)
Computer  Science And Engineering
B.E. Computer
Chemistry
Nanotechnology
Computational Data Science
Data Science And Engineering
Computer Science And Engineering (Data Science)
Mathemtaics With Computer Science
Computer Science And Technology
Metallu

In [149]:
# Assigning values to Stream
keywords_1 = ['ece','mechanical','electronics','physics','pharmacy','mineral','metallurgy','economics']
keywords_2 = ['iot','it','software','statistics']
keywords_3 = ['ds','artificial','ai','data','computer','ml']
keywords_5 = ['unknown','missing']
 
List = []
for item in df['Stream']:
    list = item.split(' ')
    value = 0.0 
    for keyword in list:
        if keyword.lower() in keywords_1:
            value += 0.3
        elif keyword.lower() in keywords_2:
            value += 0.4
        elif keyword.lower() in keywords_3:
            value += 0.7
        elif keyword.lower() in keywords_5:
            value -= 0.05
    if value > 0.7:
        value = 0.7
    List.append(value)
df['Stream_value'] = List
df.head(5)

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10,Other skills_Aggregate,Degree_value,Stream_value
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,0/0,6.50/7,0/0,0/0,0.095238,0.2,0.4
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,0/0,8.90/10,0/0,0/0,0.52381,0.2,0.7
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,0/0,0/0,0/0,0/0,0.126984,0.3,0.7
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),Unknown,2024,0/0,0/0,85.60/85.60,10.00/10.00,0.285714,0.2,-0.05
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,0/0,8.10/10,93.40/93.40,10.00/10.00,0.063492,0.2,0.7


In [150]:
# Assiging a value of 0.5 with the candiate is available for joining , else 0
Li = []
for item in df['Are you available for 3 months, starting immediately, for a full-time work from home internship? ']:
    if 'yes' in item.lower().split(','):
        Li.append(1)
    else:
        Li.append(0)
df['Availability_value'] = Li
df.head(5)

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10,Other skills_Aggregate,Degree_value,Stream_value,Availability_value
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,0/0,6.50/7,0/0,0/0,0.095238,0.2,0.4,1
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,0/0,8.90/10,0/0,0/0,0.52381,0.2,0.7,1
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,0/0,0/0,0/0,0/0,0.126984,0.3,0.7,1
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),Unknown,2024,0/0,0/0,85.60/85.60,10.00/10.00,0.285714,0.2,-0.05,1
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,0/0,8.10/10,93.40/93.40,10.00/10.00,0.063492,0.2,0.7,1


In [151]:
# Assiging values for year of graduation
List = []
# Giving more priorty to 2024 Graduates , and difference between current year of graduation is subracted from a scale of 3 to calculate year of graduation value
for item in df['Current Year Of Graduation']:
    List.append(3-(abs(int(item)-2024)))
df['Current Year Of Graduation_value'] = List
df.head(5)

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10,Other skills_Aggregate,Degree_value,Stream_value,Availability_value,Current Year Of Graduation_value
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,0/0,6.50/7,0/0,0/0,0.095238,0.2,0.4,1,0
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,0/0,8.90/10,0/0,0/0,0.52381,0.2,0.7,1,3
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,0/0,0/0,0/0,0/0,0.126984,0.3,0.7,1,1
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),Unknown,2024,0/0,0/0,85.60/85.60,10.00/10.00,0.285714,0.2,-0.05,1,3
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,0/0,8.10/10,93.40/93.40,10.00/10.00,0.063492,0.2,0.7,1,2


In [152]:
# Assinging values to performance in studies
Studies_score_10 = []
for item in df['Performance_10']:
    x,y = map(float,item.split('/'))
    #print(x,y)
    try:
        if x <= 10:
            Studies_score_10.append((x/10)*0.05)
        else:
            Studies_score_10.append((x/100)*0.05)
    except:
        Studies_score_10.append(0)
Studies_score_12 = []
for item in df['Performance_12']:
    x,y = map(float,item.split('/'))
    try:
        if x <= 10:
            Studies_score_12.append((x/10)*0.05)
        else:
            Studies_score_12.append((x/100)*0.1)
    except:
        Studies_score_12.append(0)
Studies_score_UG = []
for item in df['Performance_UG']:
    x,y = map(float,item.split('/'))
    try:
        if x <= 10:
            Studies_score_UG.append((x/10)*0.05)
        else:
            Studies_score_UG.append((x/100)*0.05)
    except:
        Studies_score_UG.append(0)
Studies_score_PG = []
for item in df['Performance_PG']:
    x,y = map(float,item.split('/'))
    try:
        if x <= 10:
            Studies_score_PG.append((x/10)*0.1)
        else:
            Studies_score_PG.append((x/100)*0.1)
    except:
        Studies_score_PG.append(0)
studies_score = []
print(len(Studies_score_10))
print(len(Studies_score_12))
print(len(Studies_score_UG))
print(len(Studies_score_PG))
for i in range(len(Studies_score_10)):
    studies_score.append(Studies_score_10[i] + Studies_score_12[i] + Studies_score_UG[i] + Studies_score_PG[i])
df['Studies_value'] = studies_score
df.head(5)

1136
1136
1136
1136


Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10,Other skills_Aggregate,Degree_value,Stream_value,Availability_value,Current Year Of Graduation_value,Studies_value
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,0/0,6.50/7,0/0,0/0,0.095238,0.2,0.4,1,0,0.0325
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,0/0,8.90/10,0/0,0/0,0.52381,0.2,0.7,1,3,0.0445
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,0/0,0/0,0/0,0/0,0.126984,0.3,0.7,1,1,0.0
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),Unknown,2024,0/0,0/0,85.60/85.60,10.00/10.00,0.285714,0.2,-0.05,1,3,0.1356
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,0/0,8.10/10,93.40/93.40,10.00/10.00,0.063492,0.2,0.7,1,2,0.1839


In [153]:
df.head(5)

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,Performance_PG,Performance_UG,Performance_12,Performance_10,Other skills_Aggregate,Degree_value,Stream_value,Availability_value,Current Year Of Graduation_value,Studies_value
0,,1,0,0,1,"MS-Excel, MS-Word, Deep Learning, MySQL, Pytho...","Yes, I am available for 3 months starting imme...",Bachelor of Vocation (B.Voc.),Software Engineering,2021,0/0,6.50/7,0/0,0/0,0.095238,0.2,0.4,1,0,0.0325
1,,2,0,0,0,"Git, GitHub, Linux, Adobe After Effects, Adobe...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,0/0,8.90/10,0/0,0/0,0.52381,0.2,0.7,1,3,0.0445
2,,2,2,0,0,"Amazon Web Services (AWS), Docker, Hadoop, MS-...","Yes, I am available for 3 months starting imme...",Master of Science (M.S.),Data Science And Analytics,2022,0/0,0/0,0/0,0/0,0.126984,0.3,0.7,1,1,0.0
3,,3,2,2,0,"Adobe XD, BIG DATA ANALYTICS, Canva, Data Anal...","Yes, I am available for 3 months starting imme...",Bachelor of Engineering (B.E),Unknown,2024,0/0,0/0,85.60/85.60,10.00/10.00,0.285714,0.2,-0.05,1,3,0.1356
4,,2,2,0,0,"C++ Programming, Data Science, Machine Learnin...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science,2023,0/0,8.10/10,93.40/93.40,10.00/10.00,0.063492,0.2,0.7,1,2,0.1839


In [154]:
# Define weights for each feature
# weights are so that aggregate rules are followed
'''
- **Python** (Aggregate: 1.5)
- **Machine Learning** (Aggregate: 1.8)
- **Deep Learning** (Aggregate: 1.8)
- **Natural Language Processing (NLP)** (Aggregate: 1.5)
- **Other Skills Rating** (Aggregate: 0.5)
- **Availability** (Aggregate: 0.5)
- **Degree Rating** (Aggregate: 0.5)
- **Stream Rating** (Aggregate: 1.4)
- **Year of Graduation Rating** (Aggregate: 0.5)
- **Study Rating** (Aggregate: 1)
'''
weights = {
    'Python (out of 3)': 0.5,
    'Machine Learning (out of 3)': 0.6,
    'Natural Language Processing (NLP) (out of 3)': 0.5,
    'Deep Learning (out of 3)': 0.6,
    'Degree_value' : 0.1,
    'Stream_value' : 2,
    'Current Year Of Graduation_value': 0.166,
    'Studies_value' : 2,
    'Availability_value' :0.5
}


cols = []
for item in weights:
    df[item+'_aggregate'] = df[item]*weights[item]
    cols.append(item+'_aggregate')

df['Score'] = df['Other skills_Aggregate']
for item in cols:
    df['Score'] += df[item] 


'''
 Calculate the score for each intern by multiplying the feature values with their respective weights and summing them up
df['Score'] = (df['Python (out of 3)'] * weights['Python (out of 3)'] +
                 df['Machine Learning (out of 3)'] * weights['Machine Learning (out of 3)'] +
                 df['Natural Language Processing (NLP) (out of 3)'] * weights['Natural Language Processing (NLP) (out of 3)'] +
                 df['Deep Learning (out of 3)'] * weights['Deep Learning (out of 3)'] +
                 df['Degree_value'] * weights['Degree_value'] +
                 df['Other skills_value'] * weights['Other skills_value']+
                 df['Are you available for 3 months, starting immediately, for a full-time work from home internship? ']+
                 df['Stream_value']* weights['Stream_value']+
                 df['Current Year Of Graduation_value']* weights['Current Year Of Graudation_value']+
                 df['Studies_Score']* weights['Studies_Score'])

'''
#

"\n Calculate the score for each intern by multiplying the feature values with their respective weights and summing them up\ndf['Score'] = (df['Python (out of 3)'] * weights['Python (out of 3)'] +\n                 df['Machine Learning (out of 3)'] * weights['Machine Learning (out of 3)'] +\n                 df['Natural Language Processing (NLP) (out of 3)'] * weights['Natural Language Processing (NLP) (out of 3)'] +\n                 df['Deep Learning (out of 3)'] * weights['Deep Learning (out of 3)'] +\n                 df['Degree_value'] * weights['Degree_value'] +\n                 df['Other skills_value'] * weights['Other skills_value']+\n                 df['Are you available for 3 months, starting immediately, for a full-time work from home internship? ']+\n                 df['Stream_value']* weights['Stream_value']+\n                 df['Current Year Of Graduation_value']* weights['Current Year Of Graudation_value']+\n                 df['Studies_Score']* weights['Studies_Scor

In [155]:
df['Score']

0       2.580238
1       4.030810
2       4.422984
3       5.174914
4       4.883292
          ...   
1131    4.646984
1132    8.851333
1133    7.319414
1134    4.340492
1135    2.129111
Name: Score, Length: 1136, dtype: float64

In [156]:
ranked_data = df.sort_values(by='Score', ascending=False)

# Select the top-ranked intern as the best candidate
best_candidate = ranked_data.iloc[0]

print("Best Candidate:")
print(best_candidate)

Best Candidate:
Name                                                                                                                                               NaN
Python (out of 3)                                                                                                                                    3
Machine Learning (out of 3)                                                                                                                          3
Natural Language Processing (NLP) (out of 3)                                                                                                         3
Deep Learning (out of 3)                                                                                                                             3
Other skills                                                                                         C Programming, C++ Programming, Data Analytics...
Are you available for 3 months, starting immediately, for a full-time work fro

In [157]:
ranked_data.head(10)

Unnamed: 0,Name,Python (out of 3),Machine Learning (out of 3),Natural Language Processing (NLP) (out of 3),Deep Learning (out of 3),Other skills,"Are you available for 3 months, starting immediately, for a full-time work from home internship?",Degree,Stream,Current Year Of Graduation,...,Python (out of 3)_aggregate,Machine Learning (out of 3)_aggregate,Natural Language Processing (NLP) (out of 3)_aggregate,Deep Learning (out of 3)_aggregate,Degree_value_aggregate,Stream_value_aggregate,Current Year Of Graduation_value_aggregate,Studies_value_aggregate,Availability_value_aggregate,Score
465,,3,3,3,3,"C Programming, C++ Programming, Data Analytics...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,...,1.5,1.8,1.5,1.8,0.02,1.4,0.498,0.0,0.5,9.398952
862,,3,3,3,3,"C++ Programming, Computer Vision, Data Analyti...","No, From 1st august.",B.Tech,Artificial Intelligence And Data Science,2024,...,1.5,1.8,1.5,1.8,0.02,1.4,0.498,0.3593,0.0,9.35349
422,,3,3,3,3,"Deep Learning, Machine Learning, Natural Langu...","Yes, I am available for 3 months starting imme...",Master of Technology (M.Tech),Artificial Intelligence,2023,...,1.5,1.8,1.5,1.8,0.03,1.4,0.332,0.2454,0.5,9.329622
445,,3,3,2,2,".NET, Adobe XD, Amazon Web Server (AWS), Artif...","Yes, I am available for 3 months starting imme...",B.Tech,Computer Science & Engineering,2024,...,1.5,1.8,1.0,1.2,0.02,1.4,0.498,0.3776,0.5,9.279727
182,,3,3,3,2,"Machine Learning, MongoDB, Natural Language Pr...","Yes, I am available for 3 months starting imme...",B.Tech (Hons.),computer science,2024,...,1.5,1.8,1.5,1.2,0.02,1.4,0.498,0.2892,0.5,9.135771
1000,,3,3,3,3,"Business Analysis, Data Analytics, Data Scienc...","Yes, I am available for 3 months starting imme...",Unknown,Data Science And Machine Learning,2022,...,1.5,1.8,1.5,1.8,0.0,1.4,0.166,0.256,0.5,9.112476
674,,3,3,2,3,"Data Analytics, Data Science, Deep Learning, J...","Yes, I am available for 3 months starting imme...",B.Tech,Data Science,2024,...,1.5,1.8,1.0,1.8,0.02,1.4,0.498,0.2656,0.5,9.005822
246,,3,3,3,3,"Deep Learning, Flask, Hadoop, JavaScript, Mach...","Yes, I am available for 3 months starting imme...",Master of Science (M.Sc),Economics,2023,...,1.5,1.8,1.5,1.8,0.03,0.6,0.332,0.493,0.5,8.967698
36,,3,2,3,3,"Computer Vision, Data Science, Deep Learning, ...","Yes, I am available for 3 months starting imme...",B.Tech,computer science,2024,...,1.5,1.2,1.5,1.8,0.02,1.4,0.498,0.182,0.5,8.965079
786,,3,3,3,3,"Computer Vision, Data Analytics, Data Science,...","Yes, I am available for 3 months starting imme...",Bachelor of Science (B.Sc),Computer Science,2022,...,1.5,1.8,1.5,1.8,0.02,1.4,0.166,0.075,0.5,8.935603
