# **Veri Manipülasyonu**

## **NumPy**

* Numerical Python kelimelerinin kısaltmasıdır.
* Bilimsel hesaplamalar için kullanılır.
* Arrayler / çok boyutlu arrayler ve matrisler üzerinde yüksek performanslı çalışma imkanı sağlar.
* Temelleri 1995'te (matrix-sig, Guido Van Rossum) atılmış nihai olarak 2005 (Travis Oliphant) yılında hayata geçirilmiştir.
* Listelere benzerdir, farkı; verimli veri saklama ve vektörel operasyonlardır.

## Neden NumPy?

NumPy kullanmadan iki vektör aşağıdaki gibi çarpılır;

In [3]:
a = [1,2,3,4]
b = [2,3,4,5]

In [6]:
ab = []

for i in range(0, len(a)):
    ab.append(a[i]*b[i])

ab

[2, 6, 12, 20]

NumPy kullanıldığında ise bu işlem çok daha basite indiregenir;

In [8]:
import numpy as np

a = np.array([1,2,3,4])
b = np.array([2,3,4,5])

a*b

array([ 2,  6, 12, 20])

Veri saklama anlamında da NumPy, klasik yönteme göre çok daha verimlidir.

## NumPy Array'i Oluşturmak

In [9]:
import numpy as np

In [10]:
np.array([1,2,3,4,5])

array([1, 2, 3, 4, 5])

In [11]:
a = np.array([1,2,3,4,5])

In [12]:
type(a)

numpy.ndarray

In [13]:
np.array([3.14,4,2,13])

array([ 3.14,  4.  ,  2.  , 13.  ])

NumPy, array içerisinde her bir elemanın tip bilgisini tutmaz, her eleman için sadece bir tip bilgisi tutar. Bu nedenle veri saklama anlamında verimlidir. Yukarıdaki array'de tüm elemanlar float tipine dönüştürülmüştür.

In [14]:
np.array([3.14,4,2,13], dtype = "int")

array([ 3,  4,  2, 13])

Bu array'de eleman tipini biz seçtiğimiz için tüm elemanlar integer tipine dönüşmüştür.

## Sıfırdan Array Oluşturmak

In [16]:
np.zeros(10, dtype = "int")

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [18]:
np.ones((3,5), dtype = "int")

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [20]:
np.full((3,5), 3, dtype = "int")

array([[3, 3, 3, 3, 3],
       [3, 3, 3, 3, 3],
       [3, 3, 3, 3, 3]])

In [22]:
np.arange(0,31,3)

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30])

In [23]:
np.linspace(0,1,10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

In [24]:
np.random.normal(10,4,(3,4))

array([[ 7.24692719, 10.06074469, 11.68311323, 12.9169039 ],
       [ 7.80392918, 12.02787575,  3.94605995,  7.17922076],
       [ 7.12635759, 13.87109062,  2.766452  ,  6.87936802]])

np.random.normal(ortalama, standart sapma, (matris boyutu))

In [26]:
np.random.randint(0,10,(3,3))

array([[7, 6, 9],
       [8, 3, 1],
       [6, 1, 8]])

np.random.randint(başlangıç değeri, son değer, (matris boyutu))

## NumPy Array Özellikleri

* ndim: Boyut sayısı
* shape: Boyut bilgisi
* size: Toplam eleman sayısı
* dtype: Array veri tipi

In [28]:
import numpy as np

In [30]:
a = np.random.randint(10, size = 10)

In [41]:
a

array([0, 9, 6, 4, 7, 3, 0, 6, 1, 9])

In [31]:
a.ndim

1

In [32]:
a.shape

(10,)

In [33]:
a.size

10

In [34]:
a.dtype

dtype('int32')

In [35]:
b = np.random.randint(10, size = (3,5))

In [36]:
b

array([[3, 2, 9, 9, 2],
       [1, 5, 4, 6, 4],
       [6, 0, 2, 7, 3]])

In [37]:
b.ndim

2

In [38]:
b.shape

(3, 5)

In [39]:
b.size

15

In [40]:
b.dtype

dtype('int32')

## Yeniden Şekillendirme (Reshaping)

In [264]:
import numpy as np

In [265]:
np.arange(1,10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [266]:
np.arange(1,10).reshape((3,3))

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [267]:
a = np.arange(1,10)

In [268]:
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [269]:
a.ndim

1

In [270]:
b = a.reshape((1,9))

In [271]:
b

array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [272]:
b.ndim

2

## Array Birleştirme (Concatenation)

In [69]:
import numpy as np

In [70]:
x = np.array([1,2,3])
y = np.array([4,5,6])

In [71]:
np.concatenate([x,y])

array([1, 2, 3, 4, 5, 6])

In [72]:
z = np.array([7,8,9])

In [73]:
np.concatenate([x,y,z])

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

**İki boyut için;**

In [74]:
a = np.array([[1,2,3],
              [4,5,6]])

In [75]:
np.concatenate([a,a])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [76]:
np.concatenate([a,a], axis = 1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

## Array Ayırma (Splitting)

In [274]:
import numpy as np

In [275]:
x = np.array([1,2,3,99,99,3,2,1])

In [276]:
np.split(x, [3,5])

[array([1, 2, 3]), array([99, 99]), array([3, 2, 1])]

In [80]:
a,b,c = np.split(x, [3,5])

In [81]:
a

array([1, 2, 3])

In [82]:
b

array([99, 99])

In [83]:
c

array([3, 2, 1])

**İki boyutlu ayırma için;**

In [273]:
m = np.arange(16).reshape(4,4)
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [86]:
np.vsplit(m, [2])

[array([[0, 1, 2, 3],
        [4, 5, 6, 7]]),
 array([[ 8,  9, 10, 11],
        [12, 13, 14, 15]])]

In [87]:
ust, alt = np.vsplit(m, [2])

In [88]:
ust

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [89]:
alt

array([[ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [90]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [91]:
np.hsplit(m, [2])

[array([[ 0,  1],
        [ 4,  5],
        [ 8,  9],
        [12, 13]]),
 array([[ 2,  3],
        [ 6,  7],
        [10, 11],
        [14, 15]])]

In [92]:
sol, sag = np.hsplit(m, [2])

In [93]:
sol

array([[ 0,  1],
       [ 4,  5],
       [ 8,  9],
       [12, 13]])

In [94]:
sag

array([[ 2,  3],
       [ 6,  7],
       [10, 11],
       [14, 15]])

## Array Sıralama (Sorting)

In [2]:
import numpy as np

In [96]:
v = np.array([2,1,4,6,8,3,5])

In [97]:
v

array([2, 1, 4, 6, 8, 3, 5])

In [98]:
np.sort(v)

array([1, 2, 3, 4, 5, 6, 8])

In [99]:
v.sort()

In [100]:
v

array([1, 2, 3, 4, 5, 6, 8])

np.sort(v) komutu veriyi değiştirmedi, sadece sıraladı. Veriyi sıralanmış haline değiştirmek isteseydik yeniden atama yapmamız gerekecekti. v.sort() komutu ise veriyi değiştirdi.

**İki boyutlu array sıralama için;**

In [3]:
m = np.random.normal(20,5,(3,3))
m

array([[16.2464383 , 17.93137726, 11.52372554],
       [23.95430734,  9.20436902, 19.69527505],
       [32.07978688, 19.89512467, 15.10047603]])

In [4]:
np.sort(m, axis = 1)

array([[11.52372554, 16.2464383 , 17.93137726],
       [ 9.20436902, 19.69527505, 23.95430734],
       [15.10047603, 19.89512467, 32.07978688]])

In [5]:
np.sort(m, axis = 0)

array([[16.2464383 ,  9.20436902, 11.52372554],
       [23.95430734, 17.93137726, 15.10047603],
       [32.07978688, 19.89512467, 19.69527505]])

## Index ile Elemanlara Erişmek

In [105]:
import numpy as np
a = np.random.randint(10, size = 10)
a

array([4, 2, 5, 0, 4, 3, 2, 3, 0, 6])

In [106]:
a[0]

4

In [107]:
a[-1]

6

In [108]:
a[0] = 100

In [109]:
a

array([100,   2,   5,   0,   4,   3,   2,   3,   0,   6])

In [110]:
m = np.random.randint(10, size = (3,5))
m

array([[8, 4, 8, 0, 7],
       [4, 4, 1, 5, 9],
       [1, 5, 8, 1, 5]])

In [111]:
m[0,0]

8

In [112]:
m[1,1]

4

In [113]:
m[1,4] = 99

In [114]:
m

array([[ 8,  4,  8,  0,  7],
       [ 4,  4,  1,  5, 99],
       [ 1,  5,  8,  1,  5]])

In [115]:
m[1,3] = 2.2

In [116]:
m

array([[ 8,  4,  8,  0,  7],
       [ 4,  4,  1,  2, 99],
       [ 1,  5,  8,  1,  5]])

NumPy array'ler tek tipte eleman aldığı için float tipi integer tipine dönüştürülerek atandı. Eğer m array'i tanımlanırken içerisine float tipinde bir değişken eklenseydi, tüm elemanlar float tipinde olacaktı.

## Slicing ile Elemanlara Erişmek (Array Alt Kümesine Erişmek)

In [117]:
import numpy as np

In [119]:
a = np.arange(20,30)
a

array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [120]:
a[0:3]

array([20, 21, 22])

In [121]:
a[:3]

array([20, 21, 22])

In [122]:
a[3:]

array([23, 24, 25, 26, 27, 28, 29])

In [123]:
a[1::2]

array([21, 23, 25, 27, 29])

In [124]:
a[0::2]

array([20, 22, 24, 26, 28])

In [126]:
a[0::3]

array([20, 23, 26, 29])

**İki boyutlu slice işlemleri;**

In [127]:
m = np.random.randint(10, size = (5,5))
m

array([[8, 9, 4, 9, 2],
       [7, 4, 7, 2, 5],
       [4, 1, 6, 3, 9],
       [2, 7, 2, 8, 0],
       [6, 7, 7, 0, 9]])

In [128]:
m[:,0]

array([8, 7, 4, 2, 6])

In [129]:
m[:,1]

array([9, 4, 1, 7, 7])

In [131]:
m[0,:]

array([8, 9, 4, 9, 2])

In [132]:
m[1,:]

array([7, 4, 7, 2, 5])

In [133]:
m[0:2, 0:3]

array([[8, 9, 4],
       [7, 4, 7]])

In [135]:
m[:, 0:2]

array([[8, 9],
       [7, 4],
       [4, 1],
       [2, 7],
       [6, 7]])

## Alt Küme Üzerinde İşlem Yapmak

In [136]:
import numpy as np

In [137]:
a = np.random.randint(10, size = (5,5))
a

array([[1, 1, 9, 8, 7],
       [9, 5, 8, 1, 0],
       [5, 1, 4, 6, 5],
       [0, 2, 3, 2, 0],
       [8, 9, 8, 0, 9]])

In [139]:
alt_a = a[0:3, 0:2]
alt_a

array([[1, 1],
       [9, 5],
       [5, 1]])

In [140]:
alt_a[0,0] = 9999
alt_a[1,1] = 8888

In [141]:
alt_a

array([[9999,    1],
       [   9, 8888],
       [   5,    1]])

In [142]:
a

array([[9999,    1,    9,    8,    7],
       [   9, 8888,    8,    1,    0],
       [   5,    1,    4,    6,    5],
       [   0,    2,    3,    2,    0],
       [   8,    9,    8,    0,    9]])

In [143]:
b = np.random.randint(10, size = (5,5))
b

array([[8, 7, 4, 0, 8],
       [2, 4, 1, 5, 6],
       [9, 3, 5, 1, 1],
       [4, 4, 7, 6, 3],
       [3, 0, 5, 8, 7]])

In [144]:
alt_b = b[0:3, 0:2].copy()

In [145]:
alt_b[1,1] = 9999
alt_b[0,0] = 8888

In [146]:
alt_b

array([[8888,    7],
       [   2, 9999],
       [   9,    3]])

In [147]:
b

array([[8, 7, 4, 0, 8],
       [2, 4, 1, 5, 6],
       [9, 3, 5, 1, 1],
       [4, 4, 7, 6, 3],
       [3, 0, 5, 8, 7]])

## Fancy Index ile Elemanlara Erişmek

In [148]:
import numpy as np
v = np.arange(0,30,3)
v

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

In [149]:
v[1]

3

In [150]:
v[3]

9

In [154]:
[v[1], v[3], v[5]]

[3, 9, 15]

In [155]:
al_getir = [1,3,5]

In [156]:
v[al_getir]

array([ 3,  9, 15])

**İki boyutta fancy index;**

In [158]:
m = np.arange(9).reshape((3,3))
m

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [159]:
satir = np.array([0,1])
sutun = np.array([1,2])

In [160]:
m[satir,sutun]

array([1, 5])

**Basit index ile fancy index**

In [161]:
m

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [164]:
m[0, [1,2]]

array([1, 2])

**Slice ile fancy**

In [165]:
m[0:, [1,2]]

array([[1, 2],
       [4, 5],
       [7, 8]])

## Koşullu Eleman İşlemleri

In [166]:
import numpy as np

In [167]:
v = np.array([1,2,3,4,5])

In [169]:
v > 3

array([False, False, False,  True,  True])

In [170]:
v[v > 3]

array([4, 5])

In [171]:
v*5/10

array([0.5, 1. , 1.5, 2. , 2.5])

## Matematiksel İşlemler

In [172]:
import numpy as np

In [173]:
v = np.array([1,2,3,4,5])

In [174]:
v - 1

array([0, 1, 2, 3, 4])

In [175]:
v * 5

array([ 5, 10, 15, 20, 25])

In [176]:
v*5/10-1

array([-0.5,  0. ,  0.5,  1. ,  1.5])

In [183]:
v**2

array([ 1,  4,  9, 16, 25], dtype=int32)

In [184]:
v % 2

array([1, 0, 1, 0, 1], dtype=int32)

**ufunc**

In [177]:
np.subtract(v,1)

array([0, 1, 2, 3, 4])

In [178]:
np.add(v, 1)

array([2, 3, 4, 5, 6])

In [179]:
np.multiply(v, 10)

array([10, 20, 30, 40, 50])

In [181]:
np.divide(v, 5)

array([0.2, 0.4, 0.6, 0.8, 1. ])

In [182]:
np.power(v, 2)

array([ 1,  4,  9, 16, 25], dtype=int32)

In [185]:
np.mod(v, 2)

array([1, 0, 1, 0, 1], dtype=int32)

In [186]:
np.absolute(np.array([-3]))

array([3])

In [187]:
np.sin(360)

0.9589157234143065

In [188]:
np.cos(180)

-0.5984600690578581

In [189]:
v = np.array([1,2,3])

In [190]:
np.log(v)

array([0.        , 0.69314718, 1.09861229])

In [192]:
np.log2(v)

array([0.       , 1.       , 1.5849625])

## NumPy ile İki Bilinmeyenli Denklem Çözümü

In [194]:
import numpy as np

**5 * x0 + x1 = 12**

**x0 + 3 * x1 = 10**

In [197]:
a = np.array([[5,1], [1,3]])
b = np.array([12,10])

In [201]:
x = np.linalg.solve(a, b)
x

array([1.85714286, 2.71428571])

## **Pandas**

* Panel Data kelimelerinin kısaltmasıdır.
* Veri manipülasyonu ve veri analizi için yazılmış açık kaynak kodlu bir Python kütüphanesidir.
* Ekonometrik ve finansal çalışmalar için doğmuştur.
* Temeli 2008 yılında atılmıştır.
* R DataFrame yapısını Python dünyasına taşımış ve DataFrame'ler üzerinde hızlı ve etkili çalışabilme imkanı sağlamıştır.
* Bir çok farklı veri tipini okuma ve yazma imkanı sağlar.

## Pandas Serisi Oluşturmak

In [3]:
import pandas as pd

In [4]:
pd.Series([1,2,3,4,5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

İlk sütun serinin elemanlarının indexleridir.

In [5]:
seri = pd.Series([1,2,3,4,5])

In [6]:
type(seri)

pandas.core.series.Series

In [7]:
 seri.axes

[RangeIndex(start=0, stop=5, step=1)]

Serinin index bilgileri

In [8]:
seri.dtype

dtype('int64')

In [9]:
seri.size

5

In [10]:
seri.ndim

1

In [11]:
seri.values

array([1, 2, 3, 4, 5], dtype=int64)

In [14]:
seri.head(3)

0    1
1    2
2    3
dtype: int64

Serinin ilk x gözlemi .head(x) komutu ile çağırılır.

In [13]:
seri.tail(3)

2    3
3    4
4    5
dtype: int64

Serinin son x gözlemi .tail(x) komutu ile çağırılır.

**Index isimlendirmesi**

In [15]:
pd.Series([99,32,456,324,1])

0     99
1     32
2    456
3    324
4      1
dtype: int64

In [16]:
pd.Series([99,32,456,324,1], index = [1,3,5,7,9])

1     99
3     32
5    456
7    324
9      1
dtype: int64

In [17]:
pd.Series([99,32,456,324,1], index = ["a","b","c","d","e"])

a     99
b     32
c    456
d    324
e      1
dtype: int64

In [19]:
seri = pd.Series([99,32,456,324,1], index = ["a","b","c","d","e"])

In [20]:
seri["a"]

99

In [21]:
seri["a":"c"]

a     99
b     32
c    456
dtype: int64

**Sözlük üzerinden liste oluşturmak;**

In [24]:
sozluk = {"reg":10, "log":11, "cart":12}

In [25]:
seri = pd.Series(sozluk)

In [26]:
seri

reg     10
log     11
cart    12
dtype: int64

**İki seriyi birleştirerek seri oluşturmak;**

In [28]:
sozluk = {"reg":10, "log":11, "cart":12}

In [29]:
seri = pd.Series(sozluk)

In [281]:
pd.concat([seri,seri])

reg      130
loj     4234
cart     435
rf        23
reg      130
loj     4234
cart     435
rf        23
dtype: int64

## Eleman İşlemleri

In [32]:
import numpy as np
a = np.array([1,55,86,999,213])
seri = pd.Series(a)
seri

0      1
1     55
2     86
3    999
4    213
dtype: int32

In [33]:
seri[0]

1

In [34]:
seri[0:3]

0     1
1    55
2    86
dtype: int32

In [35]:
seri = pd.Series([3213,4234,435,23], 
                 index = ["reg", "loj", "cart", "rf"])

In [36]:
seri

reg     3213
loj     4234
cart     435
rf        23
dtype: int64

In [37]:
seri.index

Index(['reg', 'loj', 'cart', 'rf'], dtype='object')

In [283]:
seri.keys

<bound method Series.keys of reg      130
loj     4234
cart     435
rf        23
dtype: int64>

In [40]:
list(seri.items())

[('reg', 3213), ('loj', 4234), ('cart', 435), ('rf', 23)]

In [41]:
seri.values

array([3213, 4234,  435,   23], dtype=int64)

**Eleman sorgulama;**

In [42]:
"reg" in seri

True

In [43]:
"a" in seri

False

**Fancy eleman;**

In [44]:
seri[["rf","reg"]]

rf       23
reg    3213
dtype: int64

In [45]:
seri["reg"] = 130

In [46]:
seri

reg      130
loj     4234
cart     435
rf        23
dtype: int64

In [48]:
seri["reg":"cart"]

reg      130
loj     4234
cart     435
dtype: int64

## Pandas Dataframe Oluşturma

In [49]:
import pandas as pd

In [52]:
l = [1,3,242,99,128]
l

[1, 3, 242, 99, 128]

In [53]:
pd.DataFrame(l, columns = ["degisken_ismi"])

Unnamed: 0,degisken_ismi
0,1
1,3
2,242
3,99
4,128


In [54]:
import numpy as np
m = np.arange (1,10).reshape((3,3))

In [55]:
pd.DataFrame(m, columns = ["var1", "var2", "var3"])

Unnamed: 0,var1,var2,var3
0,1,2,3
1,4,5,6
2,7,8,9


**DF isimlendirme**

In [57]:
df = pd.DataFrame(m, columns = ["var1", "var2", "var3"])
df.head()

Unnamed: 0,var1,var2,var3
0,1,2,3
1,4,5,6
2,7,8,9


In [58]:
df.columns = ("deg1", "deg2", "deg3")

In [59]:
df

Unnamed: 0,deg1,deg2,deg3
0,1,2,3
1,4,5,6
2,7,8,9


In [61]:
type(df)

pandas.core.frame.DataFrame

In [62]:
df.axes

[RangeIndex(start=0, stop=3, step=1),
 Index(['deg1', 'deg2', 'deg3'], dtype='object')]

In [65]:
df.shape

(3, 3)

In [66]:
df.ndim

2

In [67]:
df.size

9

In [68]:
df.values

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [69]:
type(df.values)

numpy.ndarray

## Eleman İşlemleri

In [5]:
import pandas as pd
import numpy as np
s1 = np.random.randint(10, size = 5)
s2 = np.random.randint(10, size = 5)
s3 = np.random.randint(10, size = 5)

In [6]:
sozluk = {"var1":s1, "var2":s2, "var3":s3}

In [7]:
sozluk

{'var1': array([2, 2, 3, 6, 2]),
 'var2': array([7, 6, 7, 5, 3]),
 'var3': array([2, 4, 8, 7, 1])}

In [8]:
df = pd.DataFrame(sozluk)

In [9]:
df

Unnamed: 0,var1,var2,var3
0,2,7,2
1,2,6,4
2,3,7,8
3,6,5,7
4,2,3,1


In [12]:
df[0:1]

Unnamed: 0,var1,var2,var3
0,2,7,2


In [292]:
df.index

RangeIndex(start=0, stop=5, step=1)

In [293]:
df.index = ["a","b","c","d","e"]

In [294]:
df

Unnamed: 0,var1,var2,var3
a,5,1,8
b,7,5,1
c,5,5,5
d,6,8,6
e,2,2,2


In [295]:
df["c":"e"]

Unnamed: 0,var1,var2,var3
c,5,5,5
d,6,8,6
e,2,2,2


**Silme;**

In [296]:
df.drop("a", axis = 0)

Unnamed: 0,var1,var2,var3
b,7,5,1
c,5,5,5
d,6,8,6
e,2,2,2


In [297]:
df

Unnamed: 0,var1,var2,var3
a,5,1,8
b,7,5,1
c,5,5,5
d,6,8,6
e,2,2,2


In [298]:
df.drop("a", axis = 0, inplace = True)

In [299]:
df

Unnamed: 0,var1,var2,var3
b,7,5,1
c,5,5,5
d,6,8,6
e,2,2,2


inplace = True işlemi, yapılan değişikliğin dataframe'e kaydedilmesini sağlar.

**Fancy index ile silme;**

In [300]:
l = ["c","d"]
df.drop(l, axis = 0)

Unnamed: 0,var1,var2,var3
b,7,5,1
e,2,2,2


**Değişkenler için;**

In [301]:
df

Unnamed: 0,var1,var2,var3
b,7,5,1
c,5,5,5
d,6,8,6
e,2,2,2


In [302]:
"var1" in df

True

In [303]:
l = ["var1","var4","var2"]

In [304]:
for i in l:
    print(i in df)

True
False
True


In [305]:
df

Unnamed: 0,var1,var2,var3
b,7,5,1
c,5,5,5
d,6,8,6
e,2,2,2


In [306]:
df["var4"] = df["var1"] / df["var2"]

In [307]:
df

Unnamed: 0,var1,var2,var3,var4
b,7,5,1,1.4
c,5,5,5,1.0
d,6,8,6,0.75
e,2,2,2,1.0


**Değişken silmek;**

In [310]:
df.drop("var4", axis = 1)

Unnamed: 0,var1,var2,var3
b,7,5,1
c,5,5,5
d,6,8,6
e,2,2,2


In [95]:
df

Unnamed: 0,var1,var2,var3,var4
b,8,6,2,1.333333
c,2,1,4,2.0
d,8,5,4,1.6
e,8,9,3,0.888889


In [96]:
df.drop("var4", axis = 1, inplace = True)

In [97]:
df

Unnamed: 0,var1,var2,var3
b,8,6,2
c,2,1,4
d,8,5,4
e,8,9,3


## Gözlem ve Değişken Seçimi: loc & iloc

In [98]:
import numpy as np
import pandas as pd
m =np.random.randint(1,30, size = (10,3))
df = pd.DataFrame(m, columns = ["var1","var2","var3"])
df

Unnamed: 0,var1,var2,var3
0,9,18,11
1,17,15,17
2,12,17,8
3,8,10,3
4,14,22,4
5,18,13,3
6,29,5,25
7,27,22,18
8,18,5,26
9,18,25,28


**loc:** Tanımlandığı şekli ile seçim yapmak için kullanılır.

In [99]:
df.loc[0:3]

Unnamed: 0,var1,var2,var3
0,9,18,11
1,17,15,17
2,12,17,8
3,8,10,3


**iloc:** Alışık olduğumuz indexleme mantığı ile seçim yapar.

In [100]:
df.iloc[0:3]

Unnamed: 0,var1,var2,var3
0,9,18,11
1,17,15,17
2,12,17,8


In [102]:
df.iloc[:3,:2]

Unnamed: 0,var1,var2
0,9,18
1,17,15
2,12,17


In [104]:
df.loc[0:3, "var3"]

0    11
1    17
2     8
3     3
Name: var3, dtype: int32

In [106]:
df.iloc[0:3, 2]

0    11
1    17
2     8
Name: var3, dtype: int32

In [107]:
df.iloc[0:3]["var3"]

0    11
1    17
2     8
Name: var3, dtype: int32

## Koşullu Eleman İşlemleri

In [108]:
import numpy as np
import pandas as pd
m =np.random.randint(1,30, size = (10,3))
df = pd.DataFrame(m, columns = ["var1","var2","var3"])
df

Unnamed: 0,var1,var2,var3
0,18,26,10
1,2,14,6
2,4,9,14
3,5,20,9
4,18,3,17
5,14,21,26
6,3,15,28
7,4,7,16
8,25,19,15
9,13,11,28


In [109]:
df["var1"]

0    18
1     2
2     4
3     5
4    18
5    14
6     3
7     4
8    25
9    13
Name: var1, dtype: int32

In [110]:
df["var1"][0:2]

0    18
1     2
Name: var1, dtype: int32

In [111]:
df[0:2]["var1"]

0    18
1     2
Name: var1, dtype: int32

In [112]:
df[0:2][["var1","var2"]]

Unnamed: 0,var1,var2
0,18,26
1,2,14


In [116]:
df[df.var1 > 15]

Unnamed: 0,var1,var2,var3
0,18,26,10
4,18,3,17
8,25,19,15


In [119]:
df[(df.var1 > 15) & (df.var3 < 15)]

Unnamed: 0,var1,var2,var3
0,18,26,10


In [120]:
df.loc[(df.var1 > 15),["var1","var2"]]

Unnamed: 0,var1,var2
0,18,26
4,18,3
8,25,19


In [121]:
df[(df.var1 > 15)][["var1","var2"]]

Unnamed: 0,var1,var2
0,18,26
4,18,3
8,25,19


## Birleştirme (Join) İşlemleri

In [1]:
import numpy as np
import pandas as pd
m =np.random.randint(1,30, size = (5,3))
df1 = pd.DataFrame(m, columns = ["var1","var2","var3"])
df1

Unnamed: 0,var1,var2,var3
0,8,7,29
1,24,25,16
2,8,10,12
3,6,16,22
4,28,14,14


In [2]:
df2 = df1 + 100
df2

Unnamed: 0,var1,var2,var3
0,108,107,129
1,124,125,116
2,108,110,112
3,106,116,122
4,128,114,114


In [3]:
pd.concat([df1,df2])

Unnamed: 0,var1,var2,var3
0,8,7,29
1,24,25,16
2,8,10,12
3,6,16,22
4,28,14,14
0,108,107,129
1,124,125,116
2,108,110,112
3,106,116,122
4,128,114,114


In [4]:
pd.concat([df1,df2], ignore_index = True)

Unnamed: 0,var1,var2,var3
0,8,7,29
1,24,25,16
2,8,10,12
3,6,16,22
4,28,14,14
5,108,107,129
6,124,125,116
7,108,110,112
8,106,116,122
9,128,114,114


In [5]:
df1.columns

Index(['var1', 'var2', 'var3'], dtype='object')

In [6]:
df2.columns = ["var1", "var2", "deg3"]

In [7]:
df2

Unnamed: 0,var1,var2,deg3
0,108,107,129
1,124,125,116
2,108,110,112
3,106,116,122
4,128,114,114


In [8]:
df1

Unnamed: 0,var1,var2,var3
0,8,7,29
1,24,25,16
2,8,10,12
3,6,16,22
4,28,14,14


In [9]:
pd.concat([df1,df2])

Unnamed: 0,var1,var2,var3,deg3
0,8,7,29.0,
1,24,25,16.0,
2,8,10,12.0,
3,6,16,22.0,
4,28,14,14.0,
0,108,107,,129.0
1,124,125,,116.0
2,108,110,,112.0
3,106,116,,122.0
4,128,114,,114.0


In [10]:
pd.concat([df1,df2], join = "inner")

Unnamed: 0,var1,var2
0,8,7
1,24,25
2,8,10
3,6,16
4,28,14
0,108,107
1,124,125
2,108,110
3,106,116
4,128,114


## İleri Birleştirme İşlemleri

In [2]:
import pandas as pd

**Birebir Birleştirme:** Tüm elemanların iki veri setinde de yer alması durumudur.

In [143]:
df1 = pd.DataFrame({"calisanlar": ["Ali", "Veli", "Ayse", "Fatma"],
                   "grup": ["Muhasebe", "Muhendislik", "Muhendislik", "IK"]})
df1

Unnamed: 0,calisanlar,grup
0,Ali,Muhasebe
1,Veli,Muhendislik
2,Ayse,Muhendislik
3,Fatma,IK


In [144]:
df2 = pd.DataFrame({"calisanlar": ["Ayse", "Ali", "Veli", "Fatma"],
                   "ilk_giris": [2010,2009,2014,2019]})
df2

Unnamed: 0,calisanlar,ilk_giris
0,Ayse,2010
1,Ali,2009
2,Veli,2014
3,Fatma,2019


In [145]:
pd.merge(df1,df2)

Unnamed: 0,calisanlar,grup,ilk_giris
0,Ali,Muhasebe,2009
1,Veli,Muhendislik,2014
2,Ayse,Muhendislik,2010
3,Fatma,IK,2019


In [146]:
pd.merge(df1,df2, on = "calisanlar")

Unnamed: 0,calisanlar,grup,ilk_giris
0,Ali,Muhasebe,2009
1,Veli,Muhendislik,2014
2,Ayse,Muhendislik,2010
3,Fatma,IK,2019


**Çoktan Teke**

In [148]:
df3 = pd.merge(df1,df2)
df3

Unnamed: 0,calisanlar,grup,ilk_giris
0,Ali,Muhasebe,2009
1,Veli,Muhendislik,2014
2,Ayse,Muhendislik,2010
3,Fatma,IK,2019


In [149]:
df4 = pd.DataFrame({"grup":["Muhasebe", "Muhendislik", "IK"],
                   "mudur":["Caner", "Berkay", "Mustafa"]})
df4

Unnamed: 0,grup,mudur
0,Muhasebe,Caner
1,Muhendislik,Berkay
2,IK,Mustafa


In [150]:
pd.merge(df3,df4)

Unnamed: 0,calisanlar,grup,ilk_giris,mudur
0,Ali,Muhasebe,2009,Caner
1,Veli,Muhendislik,2014,Berkay
2,Ayse,Muhendislik,2010,Berkay
3,Fatma,IK,2019,Mustafa


**Çoktan Çoğa**

In [151]:
df5 = pd.DataFrame({"grup": ["Muhasebe", "Muhasebe", "Muhendislik", "Muhendislik", "IK", "IK"],
                   "yetenekler":["Matematik", "Excel", "Kodlama", "Linux", "Excel", "Yonetim"]})
df5

Unnamed: 0,grup,yetenekler
0,Muhasebe,Matematik
1,Muhasebe,Excel
2,Muhendislik,Kodlama
3,Muhendislik,Linux
4,IK,Excel
5,IK,Yonetim


In [152]:
pd.merge(df1,df5)

Unnamed: 0,calisanlar,grup,yetenekler
0,Ali,Muhasebe,Matematik
1,Ali,Muhasebe,Excel
2,Veli,Muhendislik,Kodlama
3,Veli,Muhendislik,Linux
4,Ayse,Muhendislik,Kodlama
5,Ayse,Muhendislik,Linux
6,Fatma,IK,Excel
7,Fatma,IK,Yonetim


## Gruplama ve Toplulaştırma (Grouping & Aggregation )

## Toplulaştırma İşlemleri

**Basit toplulaştırma fonksiyonları:**
* count()
* first()
* last()
* mean()
* median()
* min()
* max()
* std()
* var()
* sum()

In [153]:
import seaborn as sns

In [155]:
df = sns.load_dataset("planets")

https://github.com/mwaskom/seaborn-data adresinden seaborn kütüphanesi içerisindeki tüm veri setleri bulunabilir.

In [158]:
df.head()

Unnamed: 0,method,number,orbital_period,mass,distance,year
0,Radial Velocity,1,269.3,7.1,77.4,2006
1,Radial Velocity,1,874.774,2.21,56.95,2008
2,Radial Velocity,1,763.0,2.6,19.84,2011
3,Radial Velocity,1,326.03,19.4,110.62,2007
4,Radial Velocity,1,516.22,10.5,119.47,2009


In [160]:
df.shape

(1035, 6)

In [161]:
df.mean()

number               1.785507
orbital_period    2002.917596
mass                 2.638161
distance           264.069282
year              2009.070531
dtype: float64

In [163]:
df["mass"].mean()

2.6381605847953233

In [164]:
df.count()

method            1035
number            1035
orbital_period     992
mass               513
distance           808
year              1035
dtype: int64

In [165]:
df.min()

method            Astrometry
number                     1
orbital_period     0.0907063
mass                  0.0036
distance                1.35
year                    1989
dtype: object

In [166]:
df.max()

method            Transit Timing Variations
number                                    7
orbital_period                       730000
mass                                     25
distance                               8500
year                                   2014
dtype: object

In [167]:
df.sum()

method            Radial VelocityRadial VelocityRadial VelocityR...
number                                                         1848
orbital_period                                          1.98689e+06
mass                                                        1353.38
distance                                                     213368
year                                                        2079388
dtype: object

In [168]:
df.std()

number                1.240976
orbital_period    26014.728304
mass                  3.818617
distance            733.116493
year                  3.972567
dtype: float64

In [169]:
df.var()

number            1.540022e+00
orbital_period    6.767661e+08
mass              1.458183e+01
distance          5.374598e+05
year              1.578129e+01
dtype: float64

In [172]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
number,1035.0,1.785507,1.240976,1.0,1.0,1.0,2.0,7.0
orbital_period,992.0,2002.917596,26014.728304,0.090706,5.44254,39.9795,526.005,730000.0
mass,513.0,2.638161,3.818617,0.0036,0.229,1.26,3.04,25.0
distance,808.0,264.069282,733.116493,1.35,32.56,55.25,178.5,8500.0
year,1035.0,2009.070531,3.972567,1989.0,2007.0,2010.0,2012.0,2014.0


df.describe().T komutu ile bazı önemli istatistikler toplu olarak gözlemlenebilir.

In [173]:
df.dropna().describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
number,498.0,1.73494,1.17572,1.0,1.0,1.0,2.0,6.0
orbital_period,498.0,835.778671,1469.128259,1.3283,38.27225,357.0,999.6,17337.5
mass,498.0,2.50932,3.636274,0.0036,0.2125,1.245,2.8675,25.0
distance,498.0,52.068213,46.596041,1.35,24.4975,39.94,59.3325,354.0
year,498.0,2007.37751,4.167284,1989.0,2005.0,2009.0,2011.0,2014.0


Eklenen .dropna() komutu eksik değerleri veri setinden çıkarır.

## Gruplama İşlemleri

In [5]:
df = pd.DataFrame({"gruplar": ["A","B","C","A","B","C"],
                  "veri": [10,11,52,23,43,44]}, columns = ["gruplar", "veri"])
df

Unnamed: 0,gruplar,veri
0,A,10
1,B,11
2,C,52
3,A,23
4,B,43
5,C,44


In [176]:
df.groupby("gruplar")

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000022DCB424040>

In [178]:
df.groupby("gruplar").mean()

Unnamed: 0_level_0,veri
gruplar,Unnamed: 1_level_1
A,16.5
B,27.0
C,48.0


In [179]:
df.groupby("gruplar").sum()

Unnamed: 0_level_0,veri
gruplar,Unnamed: 1_level_1
A,33
B,54
C,96


In [182]:
import seaborn as sns

In [181]:
df = sns.load_dataset("planets")
df.head()

Unnamed: 0,method,number,orbital_period,mass,distance,year
0,Radial Velocity,1,269.3,7.1,77.4,2006
1,Radial Velocity,1,874.774,2.21,56.95,2008
2,Radial Velocity,1,763.0,2.6,19.84,2011
3,Radial Velocity,1,326.03,19.4,110.62,2007
4,Radial Velocity,1,516.22,10.5,119.47,2009


In [183]:
df.groupby("method")

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000022DCB424E80>

In [185]:
df.groupby("method")["orbital_period"].mean()

method
Astrometry                          631.180000
Eclipse Timing Variations          4751.644444
Imaging                          118247.737500
Microlensing                       3153.571429
Orbital Brightness Modulation         0.709307
Pulsar Timing                      7343.021201
Pulsation Timing Variations        1170.000000
Radial Velocity                     823.354680
Transit                              21.102073
Transit Timing Variations            79.783500
Name: orbital_period, dtype: float64

Gruplanan method değişkenlerine karşılık gelen orbital_period değişkeninin ortalamaları

In [186]:
df.groupby("method")["mass"].mean()

method
Astrometry                            NaN
Eclipse Timing Variations        5.125000
Imaging                               NaN
Microlensing                          NaN
Orbital Brightness Modulation         NaN
Pulsar Timing                         NaN
Pulsation Timing Variations           NaN
Radial Velocity                  2.630699
Transit                          1.470000
Transit Timing Variations             NaN
Name: mass, dtype: float64

In [188]:
df.groupby("method")["orbital_period"].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
method,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Astrometry,2.0,631.18,544.217663,246.36,438.77,631.18,823.59,1016.0
Eclipse Timing Variations,9.0,4751.644444,2499.130945,1916.25,2900.0,4343.5,5767.0,10220.0
Imaging,12.0,118247.7375,213978.177277,4639.15,8343.9,27500.0,94250.0,730000.0
Microlensing,7.0,3153.571429,1113.166333,1825.0,2375.0,3300.0,3550.0,5100.0
Orbital Brightness Modulation,3.0,0.709307,0.725493,0.240104,0.291496,0.342887,0.943908,1.544929
Pulsar Timing,5.0,7343.021201,16313.265573,0.090706,25.262,66.5419,98.2114,36525.0
Pulsation Timing Variations,1.0,1170.0,,1170.0,1170.0,1170.0,1170.0,1170.0
Radial Velocity,553.0,823.35468,1454.92621,0.73654,38.021,360.2,982.0,17337.5
Transit,397.0,21.102073,46.185893,0.355,3.16063,5.714932,16.1457,331.60059
Transit Timing Variations,3.0,79.7835,71.599884,22.3395,39.67525,57.011,108.5055,160.0


## İleri Toplulaştırma İşlemleri (Aggregate, Filter, Transform, Apply)

In [189]:
import pandas as pd
df = pd.DataFrame({"gruplar": ["A","B","C","A","B","C"],
                  "degisken1": [10,23,33,22,11,99],
                  "degisken2": [100,253,333,262,111,969]},
                  columns = ["gruplar","degisken1","degisken2"])
df

Unnamed: 0,gruplar,degisken1,degisken2
0,A,10,100
1,B,23,253
2,C,33,333
3,A,22,262
4,B,11,111
5,C,99,969


**Aggregate**

In [191]:
df.groupby("gruplar").mean()

Unnamed: 0_level_0,degisken1,degisken2
gruplar,Unnamed: 1_level_1,Unnamed: 2_level_1
A,16,181
B,17,182
C,66,651


In [192]:
df.groupby("gruplar").aggregate(["min", np.median, max])

Unnamed: 0_level_0,degisken1,degisken1,degisken1,degisken2,degisken2,degisken2
Unnamed: 0_level_1,min,median,max,min,median,max
gruplar,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
A,10,16,22,100,181,262
B,11,17,23,111,182,253
C,33,66,99,333,651,969


Pandas'ın içerisinde yer alan fonksiyonlar aggregate fonksiyonu içerisinde "..." içerisinde veya normal olarak yazılabilir fakat Pandas'ın içerisinde olmayan fonksiyonlar "..." içerisinde yazılamaz.

In [196]:
df.groupby("gruplar").aggregate({"degisken1": ["min", "std"], "degisken2": ["max","var"]})

Unnamed: 0_level_0,degisken1,degisken1,degisken2,degisken2
Unnamed: 0_level_1,min,std,max,var
gruplar,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
A,10,8.485281,262,13122
B,11,8.485281,253,10082
C,33,46.669048,969,202248


**Filter**

In [197]:
import pandas as pd
df = pd.DataFrame({"gruplar": ["A","B","C","A","B","C"],
                  "degisken1": [10,23,33,22,11,99],
                  "degisken2": [100,253,333,262,111,969]},
                  columns = ["gruplar","degisken1","degisken2"])
df

Unnamed: 0,gruplar,degisken1,degisken2
0,A,10,100
1,B,23,253
2,C,33,333
3,A,22,262
4,B,11,111
5,C,99,969


In [199]:
def filter_func(x):
    return x["degisken1"].std() > 9

In [202]:
df.groupby("gruplar").std()

Unnamed: 0_level_0,degisken1,degisken2
gruplar,Unnamed: 1_level_1,Unnamed: 2_level_1
A,8.485281,114.551299
B,8.485281,100.409163
C,46.669048,449.719913


In [203]:
df.groupby("gruplar").filter(filter_func)

Unnamed: 0,gruplar,degisken1,degisken2
2,C,33,333
5,C,99,969


**Transform**

In [204]:
import pandas as pd
df = pd.DataFrame({"gruplar": ["A","B","C","A","B","C"],
                  "degisken1": [10,23,33,22,11,99],
                  "degisken2": [100,253,333,262,111,969]},
                  columns = ["gruplar","degisken1","degisken2"])
df

Unnamed: 0,gruplar,degisken1,degisken2
0,A,10,100
1,B,23,253
2,C,33,333
3,A,22,262
4,B,11,111
5,C,99,969


In [211]:
df_a = df.iloc[:,1:3]
df_a

Unnamed: 0,degisken1,degisken2
0,10,100
1,23,253
2,33,333
3,22,262
4,11,111
5,99,969


In [213]:
df_a.transform(lambda x: (x-x.mean()) / x.std())

Unnamed: 0,degisken1,degisken2
0,-0.687871,-0.738461
1,-0.299074,-0.263736
2,0.0,-0.015514
3,-0.328982,-0.235811
4,-0.657963,-0.704331
5,1.97389,1.957853


Buradaki x değişkendir. Yani sütunları gezmektedir.

**Apply**

In [219]:
import pandas as pd
df = pd.DataFrame({"gruplar": ["A","B","C","A","B","C"],
                  "degisken1": [10,23,33,22,11,99],
                  "degisken2": [100,253,333,262,111,969]},
                  columns = ["gruplar","degisken1","degisken2"])
df

Unnamed: 0,gruplar,degisken1,degisken2
0,A,10,100
1,B,23,253
2,C,33,333
3,A,22,262
4,B,11,111
5,C,99,969


In [220]:
df.apply(np.sum)

gruplar      ABCABC
degisken1       198
degisken2      2028
dtype: object

In [223]:
df.groupby("gruplar").apply(np.mean)

Unnamed: 0_level_0,degisken1,degisken2
gruplar,Unnamed: 1_level_1,Unnamed: 2_level_1
A,16.0,181.0
B,17.0,182.0
C,66.0,651.0


## Pivot Tablolar

In [6]:
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset("titanic")
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [7]:
titanic.groupby("sex")["survived"].mean()

sex
female    0.742038
male      0.188908
Name: survived, dtype: float64

In [8]:
titanic.groupby("sex")[["survived"]].mean()

Unnamed: 0_level_0,survived
sex,Unnamed: 1_level_1
female,0.742038
male,0.188908


In [14]:
titanic.groupby(["sex", "class"])[["survived"]].aggregate("mean").unstack()

Unnamed: 0_level_0,survived,survived,survived
class,First,Second,Third
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


**pivot_table ile pivot;**

In [10]:
titanic.pivot_table("survived", index = "sex", columns = "class")

class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [246]:
titanic.age.head()

0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: age, dtype: float64

In [248]:
age = pd.cut(titanic["age"], [0,18,90])
age.head(10)

0    (18.0, 90.0]
1    (18.0, 90.0]
2    (18.0, 90.0]
3    (18.0, 90.0]
4    (18.0, 90.0]
5             NaN
6    (18.0, 90.0]
7     (0.0, 18.0]
8    (18.0, 90.0]
9     (0.0, 18.0]
Name: age, dtype: category
Categories (2, interval[int64]): [(0, 18] < (18, 90]]

In [249]:
titanic.pivot_table("survived", ["sex",age], "class")

Unnamed: 0_level_0,class,First,Second,Third
sex,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,"(0, 18]",0.909091,1.0,0.511628
female,"(18, 90]",0.972973,0.9,0.423729
male,"(0, 18]",0.8,0.6,0.215686
male,"(18, 90]",0.375,0.071429,0.133663


## Dış Kaynaklı Veri Okumak

Dış kaynakta bulunan veri seti, üzerinde çalışılacak Python, JupyterLab'ın kayıtlı olduğu konumda olmalıdır.

In [250]:
import pandas as pd

.csv ve .txt uzantılı dosyalar read_csv komutu ile okunur.

**csv okuma**

In [252]:
pd.read_csv("reading_data/ornekcsv.csv", sep = ";")

Unnamed: 0,a,b,c
0,78,12,1.0
1,78,12,2.0
2,78,324,3.0
3,7,2,4.0
4,88,23,5.0
5,6,2,
6,56,11,6.0
7,7,12,7.0
8,56,21,7.0
9,346,2,8.0


sep = ";" argümanı belgenin içerisinde değişkenleri ayıran ayıracı kaldırmak için kullanılır.

**txt okuma**

In [253]:
pd.read_csv("reading_data/duz_metin.txt")

Unnamed: 0,1 2
0,2 2
1,3 2
2,4 2
3,5 2
4,6 2
5,7 2
6,8 2
7,9 2
8,10 2


**excel okuma**

In [254]:
pd.read_excel("reading_data/ornekx.xlsx")

Unnamed: 0,a,b,c
0,78,12,1.0
1,78,12,2.0
2,78,324,3.0
3,7,2,4.0
4,88,23,5.0
5,6,2,
6,56,11,6.0
7,7,12,7.0
8,56,21,7.0
9,346,2,8.0


In [255]:
df = pd.read_excel("reading_data/ornekx.xlsx")

In [256]:
type(df)

pandas.core.frame.DataFrame

In [257]:
df.head()

Unnamed: 0,a,b,c
0,78,12,1.0
1,78,12,2.0
2,78,324,3.0
3,7,2,4.0
4,88,23,5.0


In [258]:
df.columns

Index(['a', 'b', 'c'], dtype='object')

In [259]:
df.columns = ("A","B","C")

In [261]:
df.head()

Unnamed: 0,A,B,C
0,78,12,1.0
1,78,12,2.0
2,78,324,3.0
3,7,2,4.0
4,88,23,5.0


GitHub üzerinden veri seti indirilememesi durumunda veri seti açılır, sağ üstten "raw" tuşuna basılır, gelen sayfadaki veriler CTRL + A ile seçilerek kopyalanır ve dosya dizinimizde oluşturacağımız bir text dosyası içerisinde yapıştırılarak belge kaydedilir. Ardından txt okuma komutu ile veri seti okunur.