<a href="https://colab.research.google.com/github/Likhitha33/DATA-ANALYSIS/blob/main/CodeforcesData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [147]:
import numpy as np
import pandas as pd
import requests
import json
from bs4 import BeautifulSoup 

**GETTING DATA FROM CODEFORCES API**

Data can be extracted in three forms.
1.   Using requests library
2.   Using API Key
3.   Using webscraping







**USING** **REQUESTS** **LIBRARY** 

Using requests library. Get the Http request from the required website.We get the data in json format.We need to convert this data into pandas dataframe for further steps.

**CONTEST RATING DATA**

In [148]:
#Created Pandas Dataframe to store the contest rating data.
contest_rating_df=pd.DataFrame(columns=['contestId','handle_name','old_rating','new_rating','rank'])
contest_rating_df

Unnamed: 0,contestId,handle_name,old_rating,new_rating,rank


Getting the codeforces rating data from the codeforces website.

In [149]:
#Data of codeforces contestId 1
url_contest_rating='https://codeforces.com/api/contest.ratingChanges?contestId=1'
response_rating=requests.get(url_contest_rating)
response_rating.json()

{'status': 'OK',
 'result': [{'contestId': 1,
   'contestName': 'Codeforces Beta Round #1',
   'handle': 'vepifanov',
   'rank': 1,
   'ratingUpdateTimeSeconds': 1266588000,
   'oldRating': 1500,
   'newRating': 1600},
  {'contestId': 1,
   'contestName': 'Codeforces Beta Round #1',
   'handle': 'Orfest',
   'rank': 2,
   'ratingUpdateTimeSeconds': 1266588000,
   'oldRating': 1500,
   'newRating': 1597},
  {'contestId': 1,
   'contestName': 'Codeforces Beta Round #1',
   'handle': 'NALP',
   'rank': 3,
   'ratingUpdateTimeSeconds': 1266588000,
   'oldRating': 1500,
   'newRating': 1593},
  {'contestId': 1,
   'contestName': 'Codeforces Beta Round #1',
   'handle': 'forest',
   'rank': 4,
   'ratingUpdateTimeSeconds': 1266588000,
   'oldRating': 1500,
   'newRating': 1590},
  {'contestId': 1,
   'contestName': 'Codeforces Beta Round #1',
   'handle': 'rem',
   'rank': 5,
   'ratingUpdateTimeSeconds': 1266588000,
   'oldRating': 1500,
   'newRating': 1580},
  {'contestId': 1,
   'contest

Collected handle_name,old_rating,newrating,Rank from contestId 1 to 10  from contest rating.

In [150]:
#Getting the data of contestId from 1 to 10
for i in range(1,10):
  url_contest_rating1='https://codeforces.com/api/contest.ratingChanges?contestId='+str(i)
  response_cr=requests.get(url_contest_rating1)
  data=response_cr.json()
  #converting json data to pandas dataframe
  #Here we are only taking  attributes required for analysis
  for contest in  data['result']:
    contest_id=contest['contestId']
    handle_name=contest['handle']
    old_rating=contest['oldRating']
    new_rating=contest['newRating']
    rank=contest['rank']
    
    contest_rating_df=contest_rating_df.append({'contestId':contest_id,'handle_name':handle_name,'old_rating':old_rating,
                                              'new_rating':new_rating,'rank':rank},ignore_index=True)

In [151]:
contest_rating_df

Unnamed: 0,contestId,handle_name,old_rating,new_rating,rank
0,1,vepifanov,1500,1600,1
1,1,Orfest,1500,1597,2
2,1,NALP,1500,1593,3
3,1,forest,1500,1590,4
4,1,rem,1500,1580,5
...,...,...,...,...,...
2392,9,TSAM,1500,1386,251
2393,9,nep1965,1500,1386,251
2394,9,humaneitor,1500,1386,251
2395,9,Hernan,1500,1386,251


The contest_rating_df has 2397 rows and 5 columns.

Now that we have the data in required format (pandas dataframe). We have to check whether the data is clean or not.

**DATA CLEANING**

Data cleaning is a part of the process on a data science project.
Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete
data within a dataset.
When you clean your data, all outdated or incorrect information is gone – leaving you with the highest quality information for your analysis and
model building.

In [152]:
contest_rating_df.dtypes

contestId      object
handle_name    object
old_rating     object
new_rating     object
rank           object
dtype: object

In [153]:
contest_rating_df.isna().sum()

contestId      0
handle_name    0
old_rating     0
new_rating     0
rank           0
dtype: int64

Thankfully data has no null values. If there are null values count is displayed and we have to make sure that data has no null values.

In [154]:
contest_rating_df=contest_rating_df.append({'contestId':np.nan,'handle_name':np.nan,'old_rating':np.nan,
                                              'new_rating':np.nan,'rank':np.nan},ignore_index=True)
contest_rating_df=contest_rating_df.append({'contestId':'2378','handle_name':'someHanldename','old_rating':np.nan,
                                              'new_rating':np.nan,'rank':np.nan},ignore_index=True)


In [155]:
contest_rating_df.tail(3)

Unnamed: 0,contestId,handle_name,old_rating,new_rating,rank
2396,9.0,Alisher2,1500.0,1386.0,251.0
2397,,,,,
2398,2378.0,someHanldename,,,


we can see that nan values are added to contest_rating_df.

In [156]:
contest_rating_df.isna().sum()

contestId      1
handle_name    1
old_rating     2
new_rating     2
rank           2
dtype: int64

In [157]:
contest_rating_df.dropna(how='any',inplace=True)

In [158]:
contest_rating_df.isna().sum()

contestId      0
handle_name    0
old_rating     0
new_rating     0
rank           0
dtype: int64

we can see that we have removed the null values.

In [159]:
contest_rating_df.duplicated

<bound method DataFrame.duplicated of      contestId handle_name old_rating new_rating rank
0            1   vepifanov       1500       1600    1
1            1      Orfest       1500       1597    2
2            1        NALP       1500       1593    3
3            1      forest       1500       1590    4
4            1         rem       1500       1580    5
...        ...         ...        ...        ...  ...
2392         9        TSAM       1500       1386  251
2393         9     nep1965       1500       1386  251
2394         9  humaneitor       1500       1386  251
2395         9      Hernan       1500       1386  251
2396         9    Alisher2       1500       1386  251

[2397 rows x 5 columns]>

In [160]:
contest_rating_df.drop_duplicates()

Unnamed: 0,contestId,handle_name,old_rating,new_rating,rank
0,1,vepifanov,1500,1600,1
1,1,Orfest,1500,1597,2
2,1,NALP,1500,1593,3
3,1,forest,1500,1590,4
4,1,rem,1500,1580,5
...,...,...,...,...,...
2392,9,TSAM,1500,1386,251
2393,9,nep1965,1500,1386,251
2394,9,humaneitor,1500,1386,251
2395,9,Hernan,1500,1386,251


Stroing the pandas dataframe to excel.

In [161]:
contest_rating_df.to_excel("contest_rating.xlsx")

**CONTEST STATUS DATA**

In [162]:
URL = 'https://codeforces.com/api/contest.status?contestId=1&from=1&count=10'
page = requests.get(URL)
soup = BeautifulSoup(page.content)
soup

<html><body><p>{"status":"OK","result":[{"id":183491629,"contestId":1,"creationTimeSeconds":1669905405,"relativeTimeSeconds":2147483647,"problem":{"contestId":1,"index":"A","name":"Theatre Square","type":"PROGRAMMING","rating":1000,"tags":["math"]},"author":{"contestId":1,"members":[{"handle":"Fahim_70"}],"participantType":"PRACTICE","ghost":false,"startTimeSeconds":1266580800},"programmingLanguage":"GNU C11","verdict":"WRONG_ANSWER","testset":"TESTS","passedTestCount":7,"timeConsumedMillis":15,"memoryConsumedBytes":0},{"id":183491505,"contestId":1,"creationTimeSeconds":1669905355,"relativeTimeSeconds":2147483647,"problem":{"contestId":1,"index":"A","name":"Theatre Square","type":"PROGRAMMING","rating":1000,"tags":["math"]},"author":{"contestId":1,"members":[{"handle":"sQ-nax-"}],"participantType":"PRACTICE","ghost":false,"startTimeSeconds":1266580800},"programmingLanguage":"GNU C++17","verdict":"OK","testset":"TESTS","passedTestCount":20,"timeConsumedMillis":15,"memoryConsumedBytes":0

In [163]:
type(soup)

bs4.BeautifulSoup

In [164]:
contest_status_df1=pd.DataFrame(columns=['contestId','handle_name','ProgrammingLanguage','Problem_Index','Problem_Name','Problem_tags'])
contest_status_df1

Unnamed: 0,contestId,handle_name,ProgrammingLanguage,Problem_Index,Problem_Name,Problem_tags


In [165]:
contest_data=[]
OK = 200
for i in range(1,10):
  url_contest_status='https://codeforces.com/api/contest.status?contestId='+str(i)+'&from=1&count=10'
  response_status=requests.get(url_contest_status)
  if response_status.status_code == OK:
    conteststatus=response_status.json()
    contest_data+=conteststatus['result']
  else:
    print(i,"No Data",response_status.status_code)
    
else:
  print(contest_data)
  print("WORKED")

[{'id': 183491629, 'contestId': 1, 'creationTimeSeconds': 1669905405, 'relativeTimeSeconds': 2147483647, 'problem': {'contestId': 1, 'index': 'A', 'name': 'Theatre Square', 'type': 'PROGRAMMING', 'rating': 1000, 'tags': ['math']}, 'author': {'contestId': 1, 'members': [{'handle': 'Fahim_70'}], 'participantType': 'PRACTICE', 'ghost': False, 'startTimeSeconds': 1266580800}, 'programmingLanguage': 'GNU C11', 'verdict': 'WRONG_ANSWER', 'testset': 'TESTS', 'passedTestCount': 7, 'timeConsumedMillis': 15, 'memoryConsumedBytes': 0}, {'id': 183491505, 'contestId': 1, 'creationTimeSeconds': 1669905355, 'relativeTimeSeconds': 2147483647, 'problem': {'contestId': 1, 'index': 'A', 'name': 'Theatre Square', 'type': 'PROGRAMMING', 'rating': 1000, 'tags': ['math']}, 'author': {'contestId': 1, 'members': [{'handle': 'sQ-nax-'}], 'participantType': 'PRACTICE', 'ghost': False, 'startTimeSeconds': 1266580800}, 'programmingLanguage': 'GNU C++17', 'verdict': 'OK', 'testset': 'TESTS', 'passedTestCount': 20, 

In [166]:
for contest_status in contest_data:
  contest_id=contest_status['contestId']
  problem_index = contest_status['problem']['index']
  problem_name = contest_status['problem']['name']
  problem_tags = contest_status['problem']['tags']
  handle_name = contest_status['author']['members'][0]['handle']
  handle_programmingLanguage = contest_status['programmingLanguage']
  contest_status_df1=contest_status_df1.append({'contestId':contest_id,'handle_name':handle_name,'ProgrammingLanguage':handle_programmingLanguage,
                                              'Problem_Index':problem_index,'Problem_Name':problem_name,'Problem_tags':problem_tags},ignore_index=True)

In [167]:
contest_status_df1

Unnamed: 0,contestId,handle_name,ProgrammingLanguage,Problem_Index,Problem_Name,Problem_tags
0,1,Fahim_70,GNU C11,A,Theatre Square,[math]
1,1,sQ-nax-,GNU C++17,A,Theatre Square,[math]
2,1,crazy_lee,GNU C++14,A,Theatre Square,[math]
3,1,anime,GNU C++17,A,Theatre Square,[math]
4,1,crazy_lee,GNU C++14,A,Theatre Square,[math]
...,...,...,...,...,...,...
85,9,KTL_RoronoaZoro,Java 8,A,Die Roll,"[math, probabilities]"
86,9,KTL_RoronoaZoro,Java 8,A,Die Roll,"[math, probabilities]"
87,9,EdBit,GNU C++17,A,Die Roll,"[math, probabilities]"
88,9,MonaLisaVN,Java 17,A,Die Roll,"[math, probabilities]"


In [168]:
contest_status_df1.dtypes

contestId              object
handle_name            object
ProgrammingLanguage    object
Problem_Index          object
Problem_Name           object
Problem_tags           object
dtype: object

In [169]:
contest_status_df1.isna().sum()

contestId              0
handle_name            0
ProgrammingLanguage    0
Problem_Index          0
Problem_Name           0
Problem_tags           0
dtype: int64

In [170]:
contest_status_df1.isnull().sum()

contestId              0
handle_name            0
ProgrammingLanguage    0
Problem_Index          0
Problem_Name           0
Problem_tags           0
dtype: int64

In [171]:
contest_status_df1.duplicated

<bound method DataFrame.duplicated of    contestId      handle_name ProgrammingLanguage Problem_Index  \
0          1         Fahim_70             GNU C11             A   
1          1          sQ-nax-           GNU C++17             A   
2          1        crazy_lee           GNU C++14             A   
3          1            anime           GNU C++17             A   
4          1        crazy_lee           GNU C++14             A   
..       ...              ...                 ...           ...   
85         9  KTL_RoronoaZoro              Java 8             A   
86         9  KTL_RoronoaZoro              Java 8             A   
87         9            EdBit           GNU C++17             A   
88         9       MonaLisaVN             Java 17             A   
89         9       MonaLisaVN             Java 17             A   

      Problem_Name           Problem_tags  
0   Theatre Square                 [math]  
1   Theatre Square                 [math]  
2   Theatre Square       

In [172]:
contest_status_df1.head(5)

Unnamed: 0,contestId,handle_name,ProgrammingLanguage,Problem_Index,Problem_Name,Problem_tags
0,1,Fahim_70,GNU C11,A,Theatre Square,[math]
1,1,sQ-nax-,GNU C++17,A,Theatre Square,[math]
2,1,crazy_lee,GNU C++14,A,Theatre Square,[math]
3,1,anime,GNU C++17,A,Theatre Square,[math]
4,1,crazy_lee,GNU C++14,A,Theatre Square,[math]


In [173]:
contest_status_df1.drop_duplicates(subset=['contestId', 'handle_name','ProgrammingLanguage','Problem_Index','Problem_Name'], keep='last',inplace=True)

In [174]:
contest_status_df1

Unnamed: 0,contestId,handle_name,ProgrammingLanguage,Problem_Index,Problem_Name,Problem_tags
0,1,Fahim_70,GNU C11,A,Theatre Square,[math]
3,1,anime,GNU C++17,A,Theatre Square,[math]
4,1,crazy_lee,GNU C++14,A,Theatre Square,[math]
6,1,vjudge2,GNU C++17,A,Theatre Square,[math]
7,1,sQ-nax-,GNU C++17,A,Theatre Square,[math]
8,1,bkifhr10,GNU C++17,A,Theatre Square,[math]
9,1,aakashs21,GNU C++17,A,Theatre Square,[math]
13,2,PavelBesp,GNU C++20 (64),B,The least round way,"[dp, math]"
18,2,xuancx,GNU C++17,B,The least round way,"[dp, math]"
19,2,luogu_bot1,GNU C++14,A,Winner,"[hashing, implementation]"


In [175]:
contest_status_df1.to_excel('contest Status.xlsx')

In this project, I used the data from contest status and contest rating to search the insights.
Both contest rating and contest status cleaned data is stored in excel sheets, Futher these are used in analysis.

 

In [176]:
!wget 'https://public.tableau.com/app/profile/likhitha2572/viz/CodeForcesRatingstatusAnalysis/codeforcescontestratinganalysis'

--2022-12-01 14:45:41--  https://public.tableau.com/app/profile/likhitha2572/viz/CodeForcesRatingstatusAnalysis/codeforcescontestratinganalysis
Resolving public.tableau.com (public.tableau.com)... 65.9.86.19, 65.9.86.116, 65.9.86.60, ...
Connecting to public.tableau.com (public.tableau.com)|65.9.86.19|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 32590 (32K) [text/html]
Saving to: ‘codeforcescontestratinganalysis.2’


2022-12-01 14:45:42 (239 KB/s) - ‘codeforcescontestratinganalysis.2’ saved [32590/32590]



From the analysis,
In context of contests from 1 to 10 ,we can conlcude 

1.   Overall,Most of the users are using "GNU C++ 14" as their main programming language.
2.   Problem with index A are considered easy as the number of attempts are greater. 
3. Easiest problem from contest 1 to 10 is from contest 4 - problem Index A
4. we can also get information for each user of rank 1.





**SCRAPING DATA USING API KEY**

First Using API key. Get the Api key from the website and store it in variable.

In [177]:
API_KEY='59703b244505c8714ce7ce5a30f300c8cfbee9fd'
secret_code= 'a9ee79f00dc33266973ae8d632a443fcd44bfd0'

Generally, we get the API key and store it the form of url to get data. For codeforces api to get the hacks of a user we require this secret code too. 

In [178]:
import time 
time=round(time.time()*1000)
times=str(time)

In [179]:
response_list = []

#for contestid in range(1,10): 
url11 = 'https://codeforces.com/api/contest.hacks?'+'contestId=1465&apiKey='.format(API_KEY)
#+'secret='+secret_code+'&time='+times+'&apiSig=123457'+'sha512Hex(123457/contest.status?apiKey='+API_KEY+'&contestId=60&time='+times

url='https://codeforces.com/api/contest.hacks?contestId=1465&apiKey=&time=&apiSig=123456sha512Hex(123456/contest.hacks?apiKey=&contestId=1465&time=#yyy<secretcode>'.format(apiKey=API_KEY,time=times,secretcode=secret_code)
url1= 'https://codeforces.com/api/contest.status?contestId=566&from=1&count=10' 
r = requests.get(url)
response_list.append(r.json())

In [180]:
response_list

[{'status': 'OK',
  'result': [{'id': 690453,
    'creationTimeSeconds': 1608477470,
    'hacker': {'contestId': 1465,
     'members': [{'handle': 'tachyon2507'}],
     'participantType': 'CONTESTANT',
     'ghost': False,
     'room': 207,
     'startTimeSeconds': 1608476700},
    'defender': {'contestId': 1465,
     'members': [{'handle': 'noddy2.0'}],
     'participantType': 'CONTESTANT',
     'ghost': False,
     'room': 207,
     'startTimeSeconds': 1608476700},
    'verdict': 'INVALID_INPUT',
    'problem': {'contestId': 1465,
     'index': 'A',
     'name': 'In-game Chat',
     'type': 'PROGRAMMING',
     'points': 500.0,
     'rating': 800,
     'tags': ['implementation', 'strings']},
    'judgeProtocol': {'protocol': 'Validator \'validator.exe\' returns exit code 3 [FAIL Expected integer, but "))))))))))))))" found (stdin, line 1)]',
     'manual': 'false',
     'verdict': 'Invalid input'}},
   {'id': 690454,
    'creationTimeSeconds': 1608477493,
    'hacker': {'contestId': 1

In [181]:
contest_hacks_df=pd.DataFrame(columns=['Problem_Name','handle_name','hackstatus','hacker_name','problem_index'])
contest_hacks_df

Unnamed: 0,Problem_Name,handle_name,hackstatus,hacker_name,problem_index


In [182]:
#if response_list[0]['status'] == 'OK':
response=response_list[0]['result']
for hacker in  response:
    problem_name=hacker['problem']['name']
    handle_name=hacker['hacker']['members'][0]['handle']
    hackstatus=hacker['judgeProtocol']['verdict']
    hacker_name=hacker['defender']['members'][0]['handle']
    problem_id=hacker['problem']['index']
    
    contest_hacks_df=contest_hacks_df.append({'Problem_Name':problem_name,'handle_name':handle_name,'hackstatus':hackstatus,
                                              'hacker_name':hacker_name,'problem_index':problem_id},ignore_index=True)


In [183]:
contest_hacks_df

Unnamed: 0,Problem_Name,handle_name,hackstatus,hacker_name,problem_index
0,In-game Chat,tachyon2507,Invalid input,noddy2.0,A
1,In-game Chat,tachyon2507,Unsuccessful hacking attempt,noddy2.0,A
2,In-game Chat,tachyon2507,Unsuccessful hacking attempt,shivamk012,A
3,Fair Numbers,vk99,Unsuccessful hacking attempt,aadee_07,B
4,Fair Numbers,not_so_good_at_coding,Unsuccessful hacking attempt,IanKlein,B
...,...,...,...,...,...
455,Fair Numbers,pajenegod,Unsuccessful hacking attempt,fan_balae,B
456,Fair Numbers,pajenegod,Unsuccessful hacking attempt,codeforpractise,B
457,Fair Numbers,pajenegod,Successful hacking attempt,modito,B
458,Fair Numbers,pajenegod,Unsuccessful hacking attempt,kappa69,B


In [188]:
contest_hacks_df.dtypes

Problem_Name     object
handle_name      object
hackstatus       object
hacker_name      object
problem_index    object
dtype: object

**DATA CLEANING**

In [189]:
contest_hacks_df.isna().sum()

Problem_Name     0
handle_name      0
hackstatus       0
hacker_name      0
problem_index    0
dtype: int64

In [190]:
contest_hacks_df.dropna(how='any',inplace=True)

In [193]:
contest_hacks_df.duplicated

<bound method DataFrame.duplicated of      Problem_Name            handle_name                    hackstatus  \
0    In-game Chat            tachyon2507                 Invalid input   
1    In-game Chat            tachyon2507  Unsuccessful hacking attempt   
2    In-game Chat            tachyon2507  Unsuccessful hacking attempt   
3    Fair Numbers                   vk99  Unsuccessful hacking attempt   
4    Fair Numbers  not_so_good_at_coding  Unsuccessful hacking attempt   
..            ...                    ...                           ...   
455  Fair Numbers              pajenegod  Unsuccessful hacking attempt   
456  Fair Numbers              pajenegod  Unsuccessful hacking attempt   
457  Fair Numbers              pajenegod    Successful hacking attempt   
458  Fair Numbers              pajenegod  Unsuccessful hacking attempt   
459  Fair Numbers              pajenegod  Unsuccessful hacking attempt   

         hacker_name problem_index  
0           noddy2.0             A  

In [194]:
contest_hacks_df.drop_duplicates()

Unnamed: 0,Problem_Name,handle_name,hackstatus,hacker_name,problem_index
0,In-game Chat,tachyon2507,Invalid input,noddy2.0,A
1,In-game Chat,tachyon2507,Unsuccessful hacking attempt,noddy2.0,A
2,In-game Chat,tachyon2507,Unsuccessful hacking attempt,shivamk012,A
3,Fair Numbers,vk99,Unsuccessful hacking attempt,aadee_07,B
4,Fair Numbers,not_so_good_at_coding,Unsuccessful hacking attempt,IanKlein,B
...,...,...,...,...,...
455,Fair Numbers,pajenegod,Unsuccessful hacking attempt,fan_balae,B
456,Fair Numbers,pajenegod,Unsuccessful hacking attempt,codeforpractise,B
457,Fair Numbers,pajenegod,Successful hacking attempt,modito,B
458,Fair Numbers,pajenegod,Unsuccessful hacking attempt,kappa69,B


Now the data looks good and can be used for further analysis.