# Some ground rules for the assignments:

For all assignments (this one and any future assignment including the final project): 

* Do not download and save the data locally unless your data is very big (~TBs). I **do not** want to see you opening a local file for the data I gave you as an URL. Anything local is suspect: local files can't be trusted (they might be manipulated, changed, modified, tempered.) Refer to my lecture notes on how to pull data from an URL using `urlopen`.

* All computations must be done locally within python. Nothing external: no manual input, no excel, no SQL, no java, etc.

* All code has to be explained. Explain your reasoning and your choices. If you installed a third party library (including `numpy`, `scipy`, `pandas` etc), explain which part you import and what that function does etc. 

* Explain your code using a markdown cell. **Do not** use code comments starting with `#` to do your explanations.

* Do not use `if __name__ == "__main__"`. EVER! If you are using that within jupyter, I am going to assume you found the solution on the internet and you cut/pasted without understanding what that peice of code did.

# Question 1

Istanbul municipality has a open data service, and it provides detailed information about their services. For this question, use the data given at [this link](https://data.ibb.gov.tr/dataset/istanbul-sehir-hatlari-iskeleleri).

1. Understand what the data is for. Explain what it is, what it records, what pieces it has in broad strokes. 

2. Data (among other things) contains geographical locations of Sea Stations ('Iskele') of Istanbul Deniz Isletmeleri boats operating in Istanbul.  Extract the locations of these stations as a pandas dataframe with 2 columns: latitudes and longitudes. The index of the dataframe has to be the station names.

The data is in XML format. Do not use any external libraries other than numpy and pandas. Use `xmltodict` to convert it into a dictionary then extract the necessary parts.

In [1]:
from xmltodict import parse
from urllib.request import urlopen
import pandas as pd

## Libraries
1. parse: use this library to parse our data in xml format and convert it to dictionary format.
2. pandas: use for dataframe process

In [2]:
with urlopen("https://data.ibb.gov.tr/dataset/b47b5391-bcca-4bb3-a575-8ece68901d5d/resource/bcbeff5d-14d7-4ec0-a211-4c4d3e96cfba/download/istanbul-ehir-hatlar-iskeleleri.kml") as url:
    raw = parse(url,encoding='utf-8')
raw = raw['kml']['Document']['Folder']['Folder']
data=pd.DataFrame(raw)

# create and analyze the data
1. We pull the data from the url and parse it into dictionary.
2. Then get the part we are working on from the raw data
3. Finally, we turn the data into a dataframe for better processing.

In [3]:
data

Unnamed: 0,name,open,Placemark,LookAt,Style
0,MERKEZ İSKELELER,1.0,"[{'name': 'MALTEPE', 'LookAt': {'gx:TimeStamp'...",,
1,HALİÇ İSKELELERİ,,"[{'name': 'AYVANSARAY', 'LookAt': {'gx:TimeSpa...",,
2,ADA İSKELELERİ,,"[{'name': 'BURGAZADA', 'LookAt': {'gx:TimeStam...",,
3,BOĞAZ İSKELELERİ,,"[{'name': 'ANADOLU HİSARI', 'LookAt': {'gx:Tim...","{'longitude': '29.08634193274094', 'latitude':...","{'ListStyle': {'listItemType': 'check', 'bgCol..."
4,ARABALI İSKELELER,,"[{'name': 'ÇUBUKLU ARABALI', 'Camera': {'gx:Ti...",,


When we examine the data, we see that the data divides the sea stations into 5 zones and shows the names and locations of the sea stations in these regions in the placemark area. 

#  Placemarks

In [4]:
placemark=data['Placemark']
length =len(placemark)
df=pd.DataFrame({'a' : []})
for i in range(length):
    df2 = pd.DataFrame.from_dict(placemark[i])
    df=pd.concat([df, df2], ignore_index=True)
del df['a']


1. we define an empty dataframe to gather all the placemarks together
2. Using for loop, we reach all the placemarks and combine with new dataframe. 

In [5]:
df

Unnamed: 0,name,LookAt,styleUrl,Point,Camera
0,MALTEPE,"{'gx:TimeStamp': {'when': '2020-06-16'}, 'gx:V...",#msn_marina8,"{'gx:drawOrder': '1', 'coordinates': '29.12942...",
1,AHIRKAPI,,#msn_marina8,"{'gx:drawOrder': '1', 'coordinates': '28.98235...","{'gx:TimeStamp': {'when': '2020-08-25'}, 'gx:V..."
2,BEŞİKTAŞ-1,"{'gx:TimeStamp': {'when': '2017-06-14'}, 'gx:V...",#msn_marina0201,"{'gx:drawOrder': '1', 'coordinates': '29.00734...",
3,BEŞİKTAŞ-2,"{'gx:TimeStamp': {'when': '2014-07-10'}, 'gx:V...",#msn_marina0201,"{'gx:drawOrder': '1', 'coordinates': '29.00558...",
4,BOSTANCI,"{'gx:TimeStamp': {'when': '2017-06-14'}, 'gx:V...",#msn_marina9,"{'gx:drawOrder': '1', 'coordinates': '29.09401...",
5,EMİNÖNÜ-1,,#msn_marina5,"{'gx:drawOrder': '1', 'coordinates': '28.97621...","{'gx:TimeStamp': {'when': '2014-07-10'}, 'gx:V..."
6,EMİNÖNÜ-2,,#msn_marina0201,"{'gx:drawOrder': '1', 'coordinates': '28.97515...","{'gx:TimeStamp': {'when': '2014-07-10'}, 'gx:V..."
7,EMİNÖNÜ-3,,#msn_marina10000,"{'gx:drawOrder': '1', 'coordinates': '28.97406...","{'gx:TimeStamp': {'when': '2014-07-10'}, 'gx:V..."
8,EMİNÖNÜ-4,,#msn_marina10000,"{'gx:drawOrder': '1', 'coordinates': '28.97300...","{'gx:TimeStamp': {'when': '2014-07-10'}, 'gx:V..."
9,HAYDARPAŞA,,#msn_marina21,"{'gx:drawOrder': '1', 'coordinates': '29.01794...","{'gx:TimeStamp': {'when': '2017-06-14'}, 'gx:V..."


In [6]:
[x for x in df]
df['LookAt']=df['LookAt'].fillna(0)
lenght = len(df)
for i in range(lenght):
    if df['LookAt'][i] == 0 :
        df['LookAt'][i] = df['Camera'][i]

After combining the placemarks I realized that the longitudes and latitudes are not in one column.Unfortunately, I can't pull the data as they are not in a single column. Therefore I combined the two columns. To be able to do this I set the Nan fields to 0.

In [7]:
Longitude = []
Latitude = []
for i in range(lenght):
    Longitude.append(df['LookAt'][i]['longitude'])
    Latitude.append(df['LookAt'][i]['latitude'])
df.insert(1, "Longitudes",Longitude)
df.insert(2, "Latitudes",Latitude)    

The question asked me for the longitudes and latitudes of the sea stations so I defined two arrays to hold these fields from the dataframe.
Then i added these arrays to dataframe.

In [8]:
del df['Camera']
del df['styleUrl']
del df['Point']
del df['LookAt']

Finally, delete  the unnecessary columns.

In [9]:
df

Unnamed: 0,name,Longitudes,Latitudes
0,MALTEPE,29.13060758098593,40.91681013544846
1,AHIRKAPI,28.98289668101853,41.00314456999032
2,BEŞİKTAŞ-1,29.00778819900819,41.04116198628195
3,BEŞİKTAŞ-2,29.0055048939288,41.04065414312002
4,BOSTANCI,29.09425745312653,40.95173395654253
5,EMİNÖNÜ-1,28.97621869809887,41.01495987953694
6,EMİNÖNÜ-2,28.97621869809887,41.01495987953694
7,EMİNÖNÜ-3,28.97495985342729,41.01488637107048
8,EMİNÖNÜ-4,28.97495985342729,41.01488637107048
9,HAYDARPAŞA,29.01810215560077,40.99577360085738


# Question 2

For this question we are going to use Istanbul Municipality data at [this link](https://data.ibb.gov.tr/dataset/sehir-hatlari-sefer-sayilari). Data contains Istanbul Deniz Isletmeleri route information.

1. Understand what the data is for, what it records, what it contains. Explain it in broad strokes.
2. Extract the data about the number of trips between stations.
2. Calculate the total number of trips in 2020,
3. Calculate the total number of trips in 2021, 
4. Which is the busiest station in 2020 and 2021?

For these questions you must extract the specific data you need from the raw data. Let me be very clear: I am not interested in the numerical answer, I'd like to see your data extraction and calculation explicitly. I need to see your code with which you extract the data, see the data frame where you record the extracted data, and the code where you group and calculate the required results.

# create and analyze the data (2020)

In [10]:
with urlopen("https://data.ibb.gov.tr/dataset/4be0d5f6-62de-4a24-a6a5-038009cde39f/resource/f1f95d5d-fa2f-479d-9d50-85ca1d604c1e/download/2020-yl-ehir-hatlar-sefer-saylar.csv") as url:
    data_2020 = pd.read_csv(url,encoding='latin-1')
data_2020

Unnamed: 0,YIL;GÜZERGAH;TOPLAM SEFER ADETÝ
0,2020;BEÞÝKTAÞ - KADIKÖY;26.879
1,2020;KADIKÖY - KARAKÖY - BEÞÝKTAÞ;13
2,2020;EMÝNÖNÜ - ÜSKÜDAR;28.441
3,2020;ÜSKÜDAR - KARAKÖY - EMÝNÖNÜ;8.737
4,2020;KADIKÖY - EMÝNÖNÜ;18.408
5,2020;KADIKÖY - KARAKÖY;25.658
6,2020;KABATAÞ - KADIKÖY - ADALAR - BOSTANCI;5.879
7,2020;ÝSTANBUL - ADALAR;4.542
8,2020;KADIKÖY - KARAKÖY - EMÝNÖNÜ;11.156
9,2020;BOÐAZ GÝDÝÞ GELÝÞ (EMÝNÖNÜ - BEÞÝKTAÞ -K...


We use another encoding because there are characters in the data that are not suitable for utf8

In [11]:
data_2020[['yıl', 'güzergah', 'toplam sefer adedi']]=data_2020['YIL;GÜZERGAH;TOPLAM SEFER ADETÝ'].str.split(';', expand=True)
del data_2020['YIL;GÜZERGAH;TOPLAM SEFER ADETÝ']

We divide the fields in the data according to the year, route and total number of trips.

In [12]:
data_2020['güzergah'] = data_2020['güzergah'].str.replace('Þ','Ş')
data_2020['güzergah'] = data_2020['güzergah'].str.replace('Ý','İ')
data_2020['güzergah'] = data_2020['güzergah'].str.replace('Ð','Ğ')

I'm replacing some letters with the letters they should be to make it look neater

# find the total number of trips

In [13]:
data_2020.dtypes

yıl                   object
güzergah              object
toplam sefer adedi    object
dtype: object

In [14]:
data_2020['toplam sefer adedi'] = data_2020['toplam sefer adedi'].str.replace('.','' ,regex=True)

In order to find the total number of trips made in 2020, I needed to collect all the trips, but the data in this column was kept as an object. I removed the dots from the numbers to convert this data into integers.

In [15]:
data_2020['toplam sefer adedi'] = data_2020['toplam sefer adedi'].astype(int)
data_2020

Unnamed: 0,yıl,güzergah,toplam sefer adedi
0,2020,BEŞİKTAŞ - KADIKÖY,26879
1,2020,KADIKÖY - KARAKÖY - BEŞİKTAŞ,13
2,2020,EMİNÖNÜ - ÜSKÜDAR,28441
3,2020,ÜSKÜDAR - KARAKÖY - EMİNÖNÜ,8737
4,2020,KADIKÖY - EMİNÖNÜ,18408
5,2020,KADIKÖY - KARAKÖY,25658
6,2020,KABATAŞ - KADIKÖY - ADALAR - BOSTANCI,5879
7,2020,İSTANBUL - ADALAR,4542
8,2020,KADIKÖY - KARAKÖY - EMİNÖNÜ,11156
9,2020,BOĞAZ GİDİŞ GELİŞ (EMİNÖNÜ - BEŞİKTAŞ -KUZGUN...,523


* Changing the necessary columns type as integer

In [16]:
data_2020['toplam sefer adedi'].sum()

193669

Result is the total number of trips in 2020.

# create and analyze the data (2021)

In [17]:
with urlopen("https://data.ibb.gov.tr/dataset/4be0d5f6-62de-4a24-a6a5-038009cde39f/resource/d2c7e4c3-fd09-4952-8a8e-776e3accf91d/download/2021-yl-ehir-hatlar-sefer-saylar.csv") as url:
    data_2021 = pd.read_csv(url,encoding='latin-1')

In [18]:
data_2021[['yıl', 'güzergah', 'toplam sefer adedi']]=data_2021['Yil;Guzergah;Toplam Sefer Adeti'].str.split(';', expand=True)
del data_2021['Yil;Guzergah;Toplam Sefer Adeti']
data_2021['güzergah'] = data_2021['güzergah'].str.replace('Þ','Ş')
data_2021['güzergah'] = data_2021['güzergah'].str.replace('Ý','İ')
data_2021['güzergah'] = data_2021['güzergah'].str.replace('Ð','Ğ')

*While creating the data, I applied the same solutions as the previous data.*

# find the total number of trips

In [19]:
data_2021.dtypes

yıl                   object
güzergah              object
toplam sefer adedi    object
dtype: object

In [20]:
data_2021['toplam sefer adedi'] = data_2021['toplam sefer adedi'].str.replace('.','' ,regex=True)
data_2021=data_2021[data_2021['yıl'].str.contains("2021")==True]

We can't convert to integers because of null values. so we drop all rows which year value is not 2021.

In [21]:
data_2021['toplam sefer adedi'] = data_2021['toplam sefer adedi'].astype(int)
data_2021

Unnamed: 0,yıl,güzergah,toplam sefer adedi
0,2021,BEŞİKTAŞ-KADIKÖY,23658
1,2021,EMİNÖNÜ-ÜSKÜDAR,23854
2,2021,EMİNÖNÜ-KADIKÖY,18298
3,2021,EMİNÖNÜ-BEŞİKTAŞ-KUZGUNCUK-BEYLERBEYİ-ÇENGELKÖ...,497
4,2021,EMİNÖNÜ-BEŞİKTAŞ-ORTAKÖY-EMİRGAN-PAŞABAHÇE-BEY...,545
5,2021,ÇENGELKÖY-BEŞİKTAŞ-EMİNÖNÜ,433
6,2021,KADIKÖY-KARAKÖY,6168
7,2021,KADIKÖY-KARAKÖY-EMİNÖNÜ,18304
8,2021,KABATAŞ-KADIKÖY-ADALAR,7046
9,2021,BOSTANCI- BÜYÜKADA-HEYBELİADA,940


In [22]:
data_2021['toplam sefer adedi'].sum()

177882

Result is the total number of trips in 2021.

In [23]:
data_split_2021 =data_2021['güzergah'].str.split('-', expand=True)
data_split_2021

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,BEŞİKTAŞ,KADIKÖY,,,,,,,,,,
1,EMİNÖNÜ,ÜSKÜDAR,,,,,,,,,,
2,EMİNÖNÜ,KADIKÖY,,,,,,,,,,
3,EMİNÖNÜ,BEŞİKTAŞ,KUZGUNCUK,BEYLERBEYİ,ÇENGELKÖY,ARNAVUTKÖY,,,,,,
4,EMİNÖNÜ,BEŞİKTAŞ,ORTAKÖY,EMİRGAN,PAŞABAHÇE,BEYKOZ,,,,,,
5,ÇENGELKÖY,BEŞİKTAŞ,EMİNÖNÜ,,,,,,,,,
6,KADIKÖY,KARAKÖY,,,,,,,,,,
7,KADIKÖY,KARAKÖY,EMİNÖNÜ,,,,,,,,,
8,KABATAŞ,KADIKÖY,ADALAR,,,,,,,,,
9,BOSTANCI,BÜYÜKADA,HEYBELİADA,,,,,,,,,


* we are splitting route's stations as columns to be able to calculate.

In [24]:
column_num = data_split_2021.shape[1]-1


In [25]:
data_split_2021[12] = data_2021['toplam sefer adedi']
data_split_2021

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,BEŞİKTAŞ,KADIKÖY,,,,,,,,,,,23658
1,EMİNÖNÜ,ÜSKÜDAR,,,,,,,,,,,23854
2,EMİNÖNÜ,KADIKÖY,,,,,,,,,,,18298
3,EMİNÖNÜ,BEŞİKTAŞ,KUZGUNCUK,BEYLERBEYİ,ÇENGELKÖY,ARNAVUTKÖY,,,,,,,497
4,EMİNÖNÜ,BEŞİKTAŞ,ORTAKÖY,EMİRGAN,PAŞABAHÇE,BEYKOZ,,,,,,,545
5,ÇENGELKÖY,BEŞİKTAŞ,EMİNÖNÜ,,,,,,,,,,433
6,KADIKÖY,KARAKÖY,,,,,,,,,,,6168
7,KADIKÖY,KARAKÖY,EMİNÖNÜ,,,,,,,,,,18304
8,KABATAŞ,KADIKÖY,ADALAR,,,,,,,,,,7046
9,BOSTANCI,BÜYÜKADA,HEYBELİADA,,,,,,,,,,940


* Adding *toplam seferi adeti* as last column of the stations dataframe

In [26]:
new_split_2021 = pd.DataFrame(columns=['istasyon adı','yolcu sayısı'])
df_split = pd.DataFrame(columns=['istasyon adı','yolcu sayısı'])
for i in range(column_num):
    new_split_2021['istasyon adı'] = data_split_2021[i]
    new_split_2021['yolcu sayısı'] = data_split_2021[12]
    df_split = pd.concat([df_split, new_split_2021], ignore_index=True)

* taking *istasyon adı* and *yolcu sayısı* values and describing as a new data frame for every column then concating them together.
* so we will have a new df which all stations stands by itself with it's *yolcu sayısı*

In [27]:
df_split

Unnamed: 0,istasyon adı,yolcu sayısı
0,BEŞİKTAŞ,23658
1,EMİNÖNÜ,23854
2,EMİNÖNÜ,18298
3,EMİNÖNÜ,497
4,EMİNÖNÜ,545
...,...,...
413,,3262
414,,389
415,,249
416,,248


In [28]:
df_unique_st=df_split['istasyon adı'].unique()

In [29]:
e=0
passenger_list=[]
for j in df_unique_st:
    for i in range(len(df_split)):
        if df_split['istasyon adı'][i]==j:
            e+=df_split['yolcu sayısı'][i]
        
    passenger_list.append(e)
    e=0

In [30]:
result_df =pd.DataFrame({'istasyon adı':df_unique_st , 'toplam yolcu sayısı':passenger_list})

* So now we have all the unique stations names as a list and all the summed passenger numbers by all stations.
* Now we will use both to create a new dataframe

In [31]:
result_df

Unnamed: 0,istasyon adı,toplam yolcu sayısı
0,BEŞİKTAŞ,41970
1,EMİNÖNÜ,72252
2,ÇENGELKÖY,12243
3,KADIKÖY,53045
4,KABATAŞ,13940
...,...,...
61,SÜTLÜCE,9106
62,İSTİNYE,3499
63,EYÜP,9106
64,EMİNÖNÜ,594


# Question 3

Using a different [dataset](https://data.ibb.gov.tr/dataset/istanbul-deniz-iskeleleri-yolcu-sayilari) again from Istanbul Municipality on Istanbul Deniz Isletmeleri: 

1. Understand what the data is for, what it records, what it contains. Explain it in broad strokes.
2. Find out the busiest station in the years 2020 and 2021,
3. Repeat the same calculation monthly: find the busiest stations for each month.
4. Does your calculation of busiest stations agree with the calculation you made in Question #2? Explain.


In [32]:
with urlopen("https://data.ibb.gov.tr/dataset/20f33ff0-1ab3-4378-9998-486e28242f48/resource/6fbdd928-8c37-43a4-8e6a-ba0fa7f767fb/download/istanbul-deniz-iskeleleri-yolcu-saylar.csv") as url:
    data = pd.read_csv(url,encoding='latin-1')

We use another encoding again because there are characters in the data that are not suitable for utf8

In [33]:
data[['yıl', 'ay','otorite adı', 'istasyon adı', 'yolcu sayısı']]=data['Yil;Ay;Otorite Adi;Istasyon Adi;Yolcu Sayisi'].str.split(';', expand=True)

Correcting the column names and separating them by *;* to have a clear result

In [34]:
del data['Yil;Ay;Otorite Adi;Istasyon Adi;Yolcu Sayisi']

In [35]:
data.yıl.unique()

array(['2021'], dtype=object)

We don't have any 2020 values in this data.

In [36]:
for i in range(len(data)):
    data['istasyon adı'][i]=data['istasyon adı'][i].upper()

We have a very bad data and have to clean it.
first we make them all uppercase to avoid differences.

In [37]:
data['istasyon adı']= data['istasyon adı'].str.replace('Þ','Ş')
data['istasyon adı']= data['istasyon adı'].str.replace('Ü','U')
data['istasyon adı']= data['istasyon adı'].str.replace('Ö','O')
data['istasyon adı']= data['istasyon adı'].str.replace('İ','I')
data['istasyon adı']= data['istasyon adı'].str.replace('Ý','I')
data['istasyon adı']= data['istasyon adı'].str.replace('Ð','G')
data['istasyon adı']= data['istasyon adı'].str.replace('Ğ','G')

And to have a that all stations have only one name (nothing like ÇUBUKLU - cubuklu) we are making some corrections.

In [38]:
data['istasyon adı']= data['istasyon adı'].str.replace('Ş','S')

In [39]:
data=data.sort_values(by='istasyon adı')

In [40]:
data['yolcu sayısı']=data['yolcu sayısı'].astype(int)


In [41]:
df_ist=data['istasyon adı'].unique()

having all unique station names in a list.

In [42]:
data

Unnamed: 0,yıl,ay,otorite adı,istasyon adı,yolcu sayısı
595,2021,11,ÝBB Toplu Ulaþým Hizmetleri Müdürlüðü,ANADOLU HISARI,3032
444,2021,9,ÝBB Toplu Ulaþým Hizmetleri Müdürlüðü,ANADOLU HISARI,8566
76,2021,4,ÝBB Toplu Ulaþým Hizmetleri Müdürlüðü,ANADOLU HISARI,3690
520,2021,10,ÝBB Toplu Ulaþým Hizmetleri Müdürlüðü,ANADOLU HISARI,9311
218,2021,6,ÝBB Toplu Ulaþým Hizmetleri Müdürlüðü,ANADOLU HISARI,5693
...,...,...,...,...,...
643,2021,11,ÝSTANBUL ÞEHÝR HATLARI TUR. SAN. VE TÝC. AÞ.,ÇUBUKLU ARABALI,7848
267,2021,6,ÝSTANBUL ÞEHÝR HATLARI TUR. SAN. VE TÝC. AÞ.,ÇUBUKLU ARABALI,18642
194,2021,5,ÝSTANBUL ÞEHÝR HATLARI TUR. SAN. VE TÝC. AÞ.,ÇUBUKLU ARABALI,9404
54,2021,3,ÝSTANBUL ÞEHÝR HATLARI TUR. SAN. VE TÝC. AÞ.,ÇUBUKLU ARABALI,4263


In [43]:
d=0
acc=[]
for j in df_ist:
    for i in range(len(data)):
        if data['istasyon adı'][i]==j:
            d+=data['yolcu sayısı'][i]
        
    acc.append(d)
    d=0

iterating over sorted data frame to find same named stations and adding their *yolcu sayi*'s together

In [44]:
df=pd.DataFrame({'istasyon adı':df_ist , 'toplam yolcu sayısı':acc})

merging station names and passenger counts together.

In [45]:
for i in range(len(df)):
    if df['toplam yolcu sayısı'][i]==df['toplam yolcu sayısı'].max():
        print("the busiest station is" ,df['istasyon adı'][i] , ":" ,df['toplam yolcu sayısı'][i] )

the busiest station is USKUDAR : 6651307


In [46]:
df

Unnamed: 0,istasyon adı,toplam yolcu sayısı
0,ANADOLU HISARI,48588
1,ANADOLU KAVAGI,68503
2,ARNAVUTKOY,107144
3,AYVANSARAY,13239
4,BALAT,33544
5,BARBAROS,1119982
6,BEBEK,53590
7,BESIKTAS,4242605
8,BEYKOZ,225292
9,BEYLERBEYI,32303


In [47]:
data['ay']=data['ay'].astype(int)

To do the same thing for months first of all to be able to make calculations we change 'ay' columns type as integer

In [48]:
month=data.sort_values(by='ay')

In [49]:
ay=month['ay'].unique()

In [50]:
d=0
for i in ay:
    data_ay= month.loc[month['ay'] == i]
    ist_ay=data_ay.d=0

df = pd.DataFrame(columns=['ay','istasyon adı','yolcu sayısı'])
for k in month['ay'].unique():
    for j in df_ist:
        for  i in range(len(data)):
            if data['istasyon adı'][i]==j:
                if data['ay'][i] == k:
                    d+=data['yolcu sayısı'][i]
        df.loc[len(df.index)] = [k, j, d]            
        d=0
df.drop(0)
    
    

Unnamed: 0,ay,istasyon adı,yolcu sayısı
1,3,ANADOLU KAVAGI,1230
2,3,ARNAVUTKOY,3136
3,3,AYVANSARAY,329
4,3,BALAT,889
5,3,BARBAROS,29500
...,...,...,...
472,11,USKUDAR,350978
473,11,YENI KABATAS,43205
474,11,YENIKAPI,6
475,11,YENIKOY,10736


To matching months with it's station names and passenger counts I made an an algorithm gives as a dataframe we need.

In [51]:
for i in ay:
    aylik_data = df.loc[df['ay'] == i]
    sorted_yolcu=aylik_data.sort_values(by='yolcu sayısı', ascending=False)
    yolcu=sorted_yolcu.iloc[0]
    print(yolcu)

ay                    3
istasyon adı    USKUDAR
yolcu sayısı     178944
Name: 48, dtype: object
ay                    4
istasyon adı    USKUDAR
yolcu sayısı     472875
Name: 101, dtype: object
ay                    5
istasyon adı    USKUDAR
yolcu sayısı     370665
Name: 154, dtype: object
ay                    6
istasyon adı    USKUDAR
yolcu sayısı     830520
Name: 207, dtype: object
ay                    7
istasyon adı    USKUDAR
yolcu sayısı    1120694
Name: 260, dtype: object
ay                    8
istasyon adı    USKUDAR
yolcu sayısı    1081644
Name: 313, dtype: object
ay                    9
istasyon adı    USKUDAR
yolcu sayısı    1048105
Name: 366, dtype: object
ay                   10
istasyon adı    USKUDAR
yolcu sayısı    1196882
Name: 419, dtype: object
ay                   11
istasyon adı    USKUDAR
yolcu sayısı     350978
Name: 472, dtype: object


With this last for block we calculated busiest station for every month and it's Üsküdar same result I found in Question 2 and that's logical.