# <font color='green'> <b>Importing Libraries </b><font color='black'>

In [1]:
import numpy as np
import pandas as pd

## <font color='blue'> <b>Basic Attributes & Methods of Series</b><font color='black'>

**SOME COMMON ATTRIBUTES** [Official Pandas API Document](https://pandas.pydata.org/docs/reference/api/pandas.Series.html)<br>

**Series.values:** Serinin değerlerinin Numpy dizisi (array) olarak döndürülmesini sağlar.

**Series.index:**  Serinin indekslerinin döndürülmesini sağlar.

**Series.dtype:**  Serinin veri tipini döndürür.

**Series.size:**   Serinin eleman sayısını döndürür.

**Series.shape:**  Serinin boyutunu döndürür.

**Series.ndim:**   Serinin boyut sayısını döndürür.

**Series.head():** Serinin ilk n adet elemanını döndürür.

**Series.tail():** Serinin son n adet elemanını döndürür.

**Series.sample():** Serinin random n adet elemanını döndürür.

**Series.describe():** Serinin istatistiksel özetini döndürür.

**Series.sort_index:** Seriyi indexlerine göre sıralar.

**Series.sort_values():** Seriyi değerlerine göre sıralar.

**Series.isnull():**      Serinin her bir elemanının null (None) olup olmadığını kontrol eder.

**Series.fillna():**      Null değerleri belirli bir değerle doldurur.
 
**Series.dropna():**      Null değerleri seriden çıkarır.

**Series.isin:**        Serideki öğelerin değerlerde yer alıp almadığını kontrol eder.

In [3]:
np.random.seed(42)

seri = pd.Series(np.random.randint(0, 50, 8))
seri

0    38
1    28
2    14
3    42
4     7
5    20
6    38
7    18
dtype: int32

In [4]:
seri.size

8

In [5]:
seri.shape

(8,)

In [6]:
seri.ndim

1

In [7]:
seri.head() #default değeri 5, hiçbişey yazmadığında serinin ilk 5 elemnaını getiriyor.

0    38
1    28
2    14
3    42
4     7
dtype: int32

In [8]:
seri.head(3)

0    38
1    28
2    14
dtype: int32

In [9]:
seri.head(10) #demek ki fazla sayı girersen de hata vermiyor.

0    38
1    28
2    14
3    42
4     7
5    20
6    38
7    18
dtype: int32

In [10]:
seri.tail() #serinin son 5 elemanını getiriyor.

3    42
4     7
5    20
6    38
7    18
dtype: int32

In [19]:
seri.sample() # rastgele bir satır döndürür. 

6    38
dtype: int32

In [18]:
seri.sample(3)

4     7
2    14
5    20
dtype: int32

In [20]:
seri.sort_values()

4     7
2    14
7    18
5    20
1    28
0    38
6    38
3    42
dtype: int32

In [21]:
seri.sort_values(ascending=False) # ascending parametresinin varsayalanı True dur. False yapılırsa büyükten küçüğe sıralama yapar

3    42
0    38
6    38
1    28
5    20
7    18
2    14
4     7
dtype: int32

In [22]:
seri.sort_index()

0    38
1    28
2    14
3    42
4     7
5    20
6    38
7    18
dtype: int32

In [23]:
seri.sort_index(ascending=False)

7    18
6    38
5    20
4     7
3    42
2    14
1    28
0    38
dtype: int32

# <font color='green'> <b>DataFrames </b><font color='black'>

## <font color='blue'> <b>Creating a DataFrame</b><font color='white'>

DataFrame, iki boyutlu bir veri koleksiyonudur. 

Verilerin tablo şeklinde (tabular) saklandığı bir veri yapısıdır. 

Veri kümeleri satırlar ve sütunlar halinde düzenlenir; veri çerçevesinde birden çok veri kümesi depolayabiliriz.

DataFrame'i, aynı dizini paylaşmak için bir araya getirilmiş bir dizi Series nesnesi olarak düşünebiliriz.

Veri çerçevesine sütun/satır seçimi ve sütun/satır ekleme gibi çeşitli aritmetik işlemleri gerçekleştirebiliriz.

DataFrame'leri harici depolamadan içe aktarabiliriz; SQL Veritabanı, CSV dosyası ve bir Excel dosyası.

[SOURCE01](https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm), 
[SOURCE02](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), 
[SOURCE03](https://morioh.com/p/2528ac775b1b), 
[SOURCE04](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python), 
[SOURCE05](https://www.guru99.com/python-pandas-tutorial.html), 
[SOURCE06](https://realpython.com/pandas-dataframe/) &
[SOURCE07](https://towardsdatascience.com/a-simple-guide-to-pandas-dataframes-b125f64e1453)<br>
[VIDEO SOURCE01](https://www.youtube.com/watch?v=zmdjNSmRXF4), 
[VIDEO SOURCE02](https://www.youtube.com/watch?v=F6kmIpWWEdU) &
[VIDEO SOURCE03](https://towardsdatascience.com/pandas-dataframe-basics-3c16eb35c4f3)<br>

### <font color='blue'> <b>Creating a DataFrame Using the Lists of Data & Columns</b><font color='black'>

In [6]:
liste_1 = [[1, 2, 3],[4, 5, 6]]
column_name = ["A", "B", "C"]
df =pd.DataFrame(data = liste_1, columns=column_name)
df

Unnamed: 0,A,B,C
0,1,2,3
1,4,5,6


In [7]:
df =pd.DataFrame(data = liste_1)
df

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6


In [32]:
liste_1 = [[1, 2, 3],[4, 5, 6]]
column_name = ["A", "B", "C", "D"]
df =pd.DataFrame(data = liste_1, columns=column_name) #veriden fazla sütun olduğundan hata verdi
df

ValueError: 4 columns passed, passed data had 3 columns

In [None]:
?pd.DataFrame

### <font color='blue'> <b>Creating a DataFrame Using a Numpy Arrays</b><font color='black'>

In [36]:
arr1 = np.arange(1,27,3).reshape(3,3)
arr1

array([[ 1,  4,  7],
       [10, 13, 16],
       [19, 22, 25]])

In [37]:
arr1.ndim

2

In [39]:
df = pd.DataFrame(arr1)
df

Unnamed: 0,0,1,2
0,1,4,7
1,10,13,16
2,19,22,25


In [40]:
df = pd.DataFrame(data = arr1)
df

Unnamed: 0,0,1,2
0,1,4,7
1,10,13,16
2,19,22,25


In [42]:
df = pd.DataFrame(arr1, columns=("a", "b", "c"), index=("x", "y", "z"))
df

Unnamed: 0,a,b,c
x,1,4,7
y,10,13,16
z,19,22,25


In [43]:
df = pd.DataFrame(arr1, ("a", "b", "c"),("x", "y", "z")) #demek ki 2. positional index, 3. positional columns olarak tanımlanmış.
df

Unnamed: 0,x,y,z
a,1,4,7
b,10,13,16
c,19,22,25


### <font color='blue'> <b>Creating a DataFrame Using a Dictionary</b><font color='black'>

In [9]:
data = {'Name':['Ayşe', 'Ahmet', 'Mehmet'],'Age':[20,30,40]}

In [10]:
pd.Series(data)

Name    [Ayşe, Ahmet, Mehmet]
Age              [20, 30, 40]
dtype: object

In [11]:
pd.DataFrame(data = data) #parameter = argument

Unnamed: 0,Name,Age
0,Ayşe,20
1,Ahmet,30
2,Mehmet,40


In [12]:
pd.DataFrame(data, columns=["Name", "Age", "Job"])

Unnamed: 0,Name,Age,Job
0,Ayşe,20,
1,Ahmet,30,
2,Mehmet,40,


In [13]:
aa= pd.DataFrame(data, columns=["aa", "bb", "cc"])
aa

Unnamed: 0,aa,bb,cc


In [14]:
aa.shape

(0, 3)

In [15]:
df = pd.DataFrame(data, columns= ["Name", "bb", "cc"])
df

Unnamed: 0,Name,bb,cc
0,Ayşe,,
1,Ahmet,,
2,Mehmet,,


In [16]:
df.shape

(3, 3)

## <font color='blue'> <b>Basic Attributes & Methods of DataFrames</b><font color='black'>

In [3]:
dict_1= {'Name':['Ayşe', 'Ahmet', 'Mehmet'],'Age':[20,30,40]}

In [4]:
df = pd.DataFrame(dict_1)
df

Unnamed: 0,Name,Age
0,Ayşe,20
1,Ahmet,30
2,Mehmet,40


In [5]:
df.head(2)

Unnamed: 0,Name,Age
0,Ayşe,20
1,Ahmet,30


In [6]:
df.tail(2)

Unnamed: 0,Name,Age
1,Ahmet,30
2,Mehmet,40


In [7]:
df.sample()

Unnamed: 0,Name,Age
1,Ahmet,30


In [8]:
df.columns

Index(['Name', 'Age'], dtype='object')

In [9]:
df.columns[1]

'Age'

In [68]:
df.columns[2]

IndexError: index 2 is out of bounds for axis 0 with size 2

In [69]:
df.columns[0:2]

Index(['Name', 'Age'], dtype='object')

In [71]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [10]:
df.mean()

TypeError: Could not convert ['AyşeAhmetMehmet'] to numeric

In [11]:
df.Age

0    20
1    30
2    40
Name: Age, dtype: int64

In [79]:
df.Age.mean()

30.0

## <font color='blue'> <b>Indexing, Slicing & Selection</b><font color='black'>

[Source01](https://pandas.pydata.org/docs/user_guide/indexing.html),
[Source02](https://www.geeksforgeeks.org/slicing-indexing-manipulating-and-cleaning-pandas-dataframe/),
[Source03](https://www.tutorialspoint.com/python_pandas/python_pandas_indexing_and_selecting_data.htm),
[Source04](https://www.dataquest.io/blog/tutorial-indexing-dataframes-in-pandas/)

In [13]:
data={"isim":["Ali", "Ayşe", "Fatma", "Veli"], "boy cm":[170,160,170,180], "kilo-kg":[70, 55, 60, 80]}

In [14]:
df = pd.DataFrame(data, index= ["A", "B", "C", "D"] )
df

Unnamed: 0,isim,boy cm,kilo-kg
A,Ali,170,70
B,Ayşe,160,55
C,Fatma,170,60
D,Veli,180,80


In [15]:
df.shape

(4, 3)

In [16]:
df.isim

A      Ali
B     Ayşe
C    Fatma
D     Veli
Name: isim, dtype: object

In [17]:
df["isim"]

A      Ali
B     Ayşe
C    Fatma
D     Veli
Name: isim, dtype: object

In [18]:
df[["isim"]]

Unnamed: 0,isim
A,Ali
B,Ayşe
C,Fatma
D,Veli


In [19]:
df.columns

Index(['isim', 'boy cm', 'kilo-kg'], dtype='object')

In [20]:
df.boy cm #arada alt tire olsaydı hata vermezdi. boşluk ya da özel karakter hata veriyor.

SyntaxError: invalid syntax (2177829930.py, line 1)

In [21]:
df["boy cm"] #daha garanti yöntem.

A    170
B    160
C    170
D    180
Name: boy cm, dtype: int64

In [24]:
a = df["kilo-kg"]
type(a)

pandas.core.series.Series

In [25]:
b = df[["kilo-kg"]]
type(a)

pandas.core.series.Series

In [26]:
b = df[["kilo-kg", "boy cm"]]
b

Unnamed: 0,kilo-kg,boy cm
A,70,170
B,55,160
C,60,170
D,80,180


In [27]:
pd.concat([df, a], axis=1)

Unnamed: 0,isim,boy cm,kilo-kg,kilo-kg.1
A,Ali,170,70,70
B,Ayşe,160,55,55
C,Fatma,170,60,60
D,Veli,180,80,80


In [28]:
df["A"] #column algılıyor

KeyError: 'A'

In [22]:
df["A" : "C"] #artık indexlemeyi görüp satır algılıyor.?

Unnamed: 0,isim,boy cm,kilo-kg
A,Ali,170,70
B,Ayşe,160,55
C,Fatma,170,60


In [23]:
df[0:1]

Unnamed: 0,isim,boy cm,kilo-kg
A,Ali,170,70


## <font color='blue'> <b>Creating a New Column</b><font color='black'>

In [24]:
df

Unnamed: 0,isim,boy cm,kilo-kg
A,Ali,170,70
B,Ayşe,160,55
C,Fatma,170,60
D,Veli,180,80


In [25]:
df["BMI"] = df["kilo-kg"] / (df["boy cm"] / 100) ** 2

In [26]:
df

Unnamed: 0,isim,boy cm,kilo-kg,BMI
A,Ali,170,70,24.221453
B,Ayşe,160,55,21.484375
C,Fatma,170,60,20.761246
D,Veli,180,80,24.691358


In [27]:
df.new = np.arange(4)

  df.new = np.arange(4)


In [28]:
df

Unnamed: 0,isim,boy cm,kilo-kg,BMI
A,Ali,170,70,24.221453
B,Ayşe,160,55,21.484375
C,Fatma,170,60,20.761246
D,Veli,180,80,24.691358


In [29]:
df["new"] = np.arange(4)

In [30]:
df

Unnamed: 0,isim,boy cm,kilo-kg,BMI,new
A,Ali,170,70,24.221453,0
B,Ayşe,160,55,21.484375,1
C,Fatma,170,60,20.761246,2
D,Veli,180,80,24.691358,3


In [31]:
df["BMI"] = round(df["kilo-kg"] / (df["boy cm"] / 100) ** 2)
df

Unnamed: 0,isim,boy cm,kilo-kg,BMI,new
A,Ali,170,70,24.0,0
B,Ayşe,160,55,21.0,1
C,Fatma,170,60,21.0,2
D,Veli,180,80,25.0,3


In [37]:
# df["kategori"] = ["kilolusunuz" if df["BMI"]> 20  else "normalsiniz"] #uğraştım ama yapamadım

df['kategori'] = ['kilolusunuz' if x > 20 else 'normalsiniz' for x in df['BMI']]
df

Unnamed: 0,isim,boy cm,kilo-kg,BMI,new,kategori
A,Ali,170,70,24.0,0,kilolusunuz
B,Ayşe,160,55,21.0,1,kilolusunuz
C,Fatma,170,60,21.0,2,kilolusunuz
D,Veli,180,80,25.0,3,kilolusunuz


In [36]:
df = pd.read_csv("adult_eda.csv")
df


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13.0,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9.0,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7.0,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13.0,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,27,Private,257302,Assoc-acdm,12.0,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9.0,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9.0,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9.0,Never-married,Adm-clerical,,White,Male,0,0,20,United-States,<=50K


In [120]:
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13.0,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9.0,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7.0,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13.0,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [121]:
df.tail()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
32556,27,Private,257302,Assoc-acdm,12.0,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9.0,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9.0,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9.0,Never-married,Adm-clerical,,White,Male,0,0,20,United-States,<=50K
32560,52,Self-emp-inc,287927,HS-grad,9.0,Married-civ-spouse,Exec-managerial,Wife,White,Female,15024,0,40,United-States,>50K


In [37]:
df.dtype # pd.Series attribute 

AttributeError: 'DataFrame' object has no attribute 'dtype'

In [130]:
df.dtypes

age                 int64
workclass          object
fnlwgt              int64
education          object
education-num     float64
marital-status     object
occupation         object
relationship       object
race               object
sex                object
capital-gain        int64
capital-loss        int64
hours-per-week      int64
native-country     object
salary             object
dtype: object

In [38]:
df["age"].dtype

dtype('int64')

In [39]:
df.size #value sayısıını 32561 * columns

488415

In [40]:
df.shape

(32561, 15)

In [41]:
df[32559:32560]["relationship"]

32559    NaN
Name: relationship, dtype: object

In [42]:
df.ndim

2

In [43]:
df.sample()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
27364,32,Private,61898,11th,7.0,Divorced,Other-service,Unmarried,White,Female,0,0,15,United-States,<=50K


In [44]:
df.sample(10)

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
1813,22,Private,311764,11th,7.0,Widowed,Sales,,Black,Female,0,0,35,United-States,<=50K
7846,33,Private,53373,HS-grad,9.0,Married-civ-spouse,Craft-repair,Husband,White,Male,0,0,35,United-States,<=50K
2406,48,Private,36503,Some-college,10.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,40,United-States,>50K
11111,28,Private,263614,Bachelors,13.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,40,United-States,>50K
30418,45,Private,187033,HS-grad,9.0,Married-civ-spouse,Craft-repair,Husband,White,Male,0,2051,40,United-States,<=50K
5111,31,Private,363130,HS-grad,9.0,Never-married,Other-service,Unmarried,Black,Male,0,0,18,United-States,<=50K
6984,36,Private,168276,10th,6.0,Divorced,Other-service,Not-in-family,White,Female,0,0,40,United-States,<=50K
26832,51,Private,205884,Some-college,10.0,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,40,United-States,>50K
3127,25,Private,131976,Bachelors,13.0,Never-married,Exec-managerial,,White,Male,0,0,55,United-States,<=50K
2593,39,Private,315776,Masters,14.0,Never-married,Exec-managerial,Not-in-family,Black,Male,8614,0,52,United-States,>50K


In [45]:
df.sample(50)["age"].mean()

40.82

In [46]:
df.isnull()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
32557,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
32558,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
32559,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False


In [47]:
df.isnull().sum()

age                  0
workclass            0
fnlwgt               0
education            0
education-num      802
marital-status       0
occupation           0
relationship      5068
race                 0
sex                  0
capital-gain         0
capital-loss         0
hours-per-week       0
native-country       0
salary               0
dtype: int64

In [146]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   age             32561 non-null  int64  
 1   workclass       32561 non-null  object 
 2   fnlwgt          32561 non-null  int64  
 3   education       32561 non-null  object 
 4   education-num   31759 non-null  float64
 5   marital-status  32561 non-null  object 
 6   occupation      32561 non-null  object 
 7   relationship    27493 non-null  object 
 8   race            32561 non-null  object 
 9   sex             32561 non-null  object 
 10  capital-gain    32561 non-null  int64  
 11  capital-loss    32561 non-null  int64  
 12  hours-per-week  32561 non-null  int64  
 13  native-country  32561 non-null  object 
 14  salary          32561 non-null  object 
dtypes: float64(1), int64(5), object(9)
memory usage: 3.7+ MB
