## Utilizarea structurilor de date in Pandas

In [1]:
import pandas as pd
import numpy as np

### Cuprins
1. [Introducere](#intro)
2. [Tipuri de date](#dtypes)
3. [Series](#series)
4. [DataFrame](#dataframe)

<!-- ![Netflix logo](../images/netflix_logo.png) -->
<img src="../images/netflix_logo.png" width="300">

Setul de date <i>Netflix Movies and TV Shows</i> contine metadatele tuturor filmelor si emisiunilor TV de pe Netflix. Poate fi accesat in cadrul platformei Kaggle la adresa https://www.kaggle.com/shivamb/netflix-shows. Setul de date este actualizat in mod regulat.

### Introducere <a name="intro"></a>

In [2]:
df = pd.read_csv("../data/netflix_titles.csv", sep=",", header=0)

In [3]:
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


| Sructura date | Dimensiuni | Descriere |
| ------------- | ---------- | --------- |
| Series | 1 | tablou unidimensional etichetat in care pot fi stocate date de orice tip (intreg, sir de caractere, float, obiecte, etc.) |
| DataFrame | 2 | structura de date tabelara bidimensionala, cu dimensiuni modificabile, cu axe etichetate (randuri si coloane) | obiectele de la nivelul unei structuri de tip DataFrame pot avea tipuri diferite |

In [4]:
s = df.title

In [5]:
s.head()

0     Dick Johnson Is Dead
1            Blood & Water
2                Ganglands
3    Jailbirds New Orleans
4             Kota Factory
Name: title, dtype: object

### Tipuri de date <a name="dtypes"></a>

In [6]:
df.dtypes

show_id         object
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object

In [7]:
df.title.dtype

dtype('O')

In [8]:
df['title'].dtype

dtype('O')

In [9]:
type(df)

pandas.core.frame.DataFrame

In [10]:
type(s)

pandas.core.series.Series

### Structuri de date de tip Series <a name="series"></a>

In [11]:
s.head()

0     Dick Johnson Is Dead
1            Blood & Water
2                Ganglands
3    Jailbirds New Orleans
4             Kota Factory
Name: title, dtype: object

In [12]:
# <Series object>.index - indexul seriei
s.index

RangeIndex(start=0, stop=8807, step=1)

In [13]:
# <Series object>.values - returneaza seria ca si ndarray
s.values

array(['Dick Johnson Is Dead', 'Blood & Water', 'Ganglands', ...,
       'Zombieland', 'Zoom', 'Zubaan'], shape=(8807,), dtype=object)

In [14]:
s = pd.read_csv("../data/netflix_titles.csv", sep=",", usecols=['title']).squeeze("columns")

In [15]:
s.head()

0     Dick Johnson Is Dead
1            Blood & Water
2                Ganglands
3    Jailbirds New Orleans
4             Kota Factory
Name: title, dtype: object

In [16]:
tutor = pd.Series(data = ['Marian Bucos', 'Sl.dr.ing.', 345, 'Programare pentru Ingineria Datelor',
                          'https://datalab.upt.ro/cursuri/programare-pentru-ingineria-datelor'])

In [17]:
tutor

0                                         Marian Bucos
1                                           Sl.dr.ing.
2                                                  345
3                  Programare pentru Ingineria Datelor
4    https://datalab.upt.ro/cursuri/programare-pent...
dtype: object

In [18]:
tutor = pd.Series(data = ['Marian Bucos', 'Sl.dr.ing.', 345, 'Programare pentru Ingineria Datelor',
                          'https://datalab.upt.ro/cursuri/programare-pentru-ingineria-datelor'],
                  index = ['nume', 'grad', 'id curs', 'denumire curs', 'adresa curs'])

In [19]:
tutor

nume                                                  Marian Bucos
grad                                                    Sl.dr.ing.
id curs                                                        345
denumire curs                  Programare pentru Ingineria Datelor
adresa curs      https://datalab.upt.ro/cursuri/programare-pent...
dtype: object

In [20]:
tutor['denumire curs']

'Programare pentru Ingineria Datelor'

In [21]:
tutor[['nume', 'denumire curs']]

nume                                    Marian Bucos
denumire curs    Programare pentru Ingineria Datelor
dtype: object

In [22]:
tutor = pd.Series({'nume': 'Marian Bucos',
                   'grad': 'Sl.dr.ing.',
                   'id curs': 345,
                   'denumire curs': 'Programare pentru Ingineria Datelor',
                   'adresa curs': 'https://datalab.upt.ro/cursuri/programare-pentru-ingineria-datelor'})

In [23]:
tutor

nume                                                  Marian Bucos
grad                                                    Sl.dr.ing.
id curs                                                        345
denumire curs                  Programare pentru Ingineria Datelor
adresa curs      https://datalab.upt.ro/cursuri/programare-pent...
dtype: object

In [24]:
tutor[['nume', 'denumire curs']]

nume                                    Marian Bucos
denumire curs    Programare pentru Ingineria Datelor
dtype: object

In [25]:
type(tutor)

pandas.core.series.Series

In [26]:
an_admitere = np.array([2016, 2018, 2020])

In [27]:
an_admitere = np.array([2003, 2016, 2018, 2020])
studenti = pd.Series(data = an_admitere,
                     index = ['Casandra', 'Zaya', 'Reya', 'Maru'])

In [28]:
studenti

Casandra    2003
Zaya        2016
Reya        2018
Maru        2020
dtype: int64

In [29]:
tutor['grad'] = 'Prof.dr.ing.'

In [30]:
tutor

nume                                                  Marian Bucos
grad                                                  Prof.dr.ing.
id curs                                                        345
denumire curs                  Programare pentru Ingineria Datelor
adresa curs      https://datalab.upt.ro/cursuri/programare-pent...
dtype: object

In [31]:
tutor['birou'] = 'DataLab B226'

In [32]:
tutor.iloc[0]


'Marian Bucos'

In [33]:
tutor[0:2]

nume    Marian Bucos
grad    Prof.dr.ing.
dtype: object

In [34]:
tutor[::-1]

birou                                                 DataLab B226
adresa curs      https://datalab.upt.ro/cursuri/programare-pent...
denumire curs                  Programare pentru Ingineria Datelor
id curs                                                        345
grad                                                  Prof.dr.ing.
nume                                                  Marian Bucos
dtype: object

In [35]:
pd.Series(['Casandra', 'Zaya', 'Reya', 'Maru'], dtype = 'string')

0    Casandra
1        Zaya
2        Reya
3        Maru
dtype: string

In [36]:
pd.Series(['Casandra', 'Zaya', 'Reya', 'Maru'], dtype = 'object')

0    Casandra
1        Zaya
2        Reya
3        Maru
dtype: object

In [37]:
s.head()

0     Dick Johnson Is Dead
1            Blood & Water
2                Ganglands
3    Jailbirds New Orleans
4             Kota Factory
Name: title, dtype: object

In [38]:
s.str[0:1].head()

0    D
1    B
2    G
3    J
4    K
Name: title, dtype: object

In [39]:
s.groupby(s.str[0:1])\
    .count().head(5)

title
#    11
'     2
(     2
1    39
2    27
Name: title, dtype: int64

In [40]:
s.groupby(s.str[0:1])\
    .count()\
    .sort_values(ascending=False)\
    .head()

title
T    1525
S     719
M     637
B     576
A     566
Name: title, dtype: int64

In [41]:
s.axes, tutor.axes

([RangeIndex(start=0, stop=8807, step=1)],
 [Index(['nume', 'grad', 'id curs', 'denumire curs', 'adresa curs', 'birou'], dtype='object')])

In [42]:
s.dtype, tutor.dtype

(dtype('O'), dtype('O'))

In [43]:
s.hasnans, tutor.hasnans

(False, False)

In [44]:
s.index, tutor.index

(RangeIndex(start=0, stop=8807, step=1),
 Index(['nume', 'grad', 'id curs', 'denumire curs', 'adresa curs', 'birou'], dtype='object'))

In [45]:
s.name, tutor.name

('title', None)

In [46]:
s.ndim, tutor.ndim

(1, 1)

In [47]:
s.nbytes, tutor.nbytes

(70456, 48)

In [48]:
s.shape, tutor.shape

((8807,), (6,))

In [49]:
s.size, tutor.size

(8807, 6)

In [50]:
s.values, tutor.values

(array(['Dick Johnson Is Dead', 'Blood & Water', 'Ganglands', ...,
        'Zombieland', 'Zoom', 'Zubaan'], shape=(8807,), dtype=object),
 array(['Marian Bucos', 'Prof.dr.ing.', 345,
        'Programare pentru Ingineria Datelor',
        'https://datalab.upt.ro/cursuri/programare-pentru-ingineria-datelor',
        'DataLab B226'], dtype=object))

In [51]:
s.iloc[1:3]

1    Blood & Water
2        Ganglands
Name: title, dtype: object

In [52]:
tutor.loc[['nume', 'id curs']]

nume       Marian Bucos
id curs             345
dtype: object

In [53]:
# <Series object>.dtype, <Series object>.dtypes - returneaza tipul datelor stocate
s.dtype

dtype('O')

In [54]:
# <Series object>.shape - returneaza un tuplu corespunzator numarului de elemente de la nivelul seriei
s.shape

(8807,)

In [55]:
# <Series object>.empty - returneaza True daca seria nu are elemente
s.empty

False

In [56]:
# <Series object>.name - returneaza numele seriei
s.name

'title'

### Structuri de date de tip DataFrame <a name="dataframe"></a>

In [57]:
df = pd.read_csv("../data/netflix_titles.csv", sep=",", header=0)

In [118]:
print(df.head())

  show_id     type                  title         director  \
0      s1    Movie   Dick Johnson Is Dead  Kirsten Johnson   
1      s2  TV Show          Blood & Water              NaN   
2      s3  TV Show              Ganglands  Julien Leclercq   
3      s4  TV Show  Jailbirds New Orleans              NaN   
4      s5  TV Show           Kota Factory              NaN   

                                                cast        country  \
0                                                NaN  United States   
1  Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...   South Africa   
2  Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...            NaN   
3                                                NaN            NaN   
4  Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...          India   

           date_added  release_year rating   duration  \
0  September 25, 2021          2020  PG-13     90 min   
1  September 24, 2021          2021  TV-MA  2 Seasons   
2  September 24, 2021        

In [59]:
df.columns.values

array(['show_id', 'type', 'title', 'director', 'cast', 'country',
       'date_added', 'release_year', 'rating', 'duration', 'listed_in',
       'description'], dtype=object)

In [60]:
df.index.values

array([   0,    1,    2, ..., 8804, 8805, 8806], shape=(8807,))

In [61]:
df.set_index('show_id').head()

Unnamed: 0_level_0,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
show_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [62]:
# Seteaza indexul (etichetele randurilor) pentru structura de tip DataFrame utilizand una sau mai multe coloane.
# Noul index poate inlocui indexul precedent sau il poate extinde.
df.set_index('show_id').index.values

array(['s1', 's2', 's3', ..., 's8805', 's8806', 's8807'],
      shape=(8807,), dtype=object)

In [63]:
# Reseteaza indexul structurii de tip DataFrame si utilizeaza in schimb indexul implicit.
df.reset_index().index.values

array([   0,    1,    2, ..., 8804, 8805, 8806], shape=(8807,))

In [64]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


In [65]:
tutori = pd.DataFrame(
    data={'nume': ['Adrian Cosma', 'Mihai Popescu', 'Florin Vlad', 'Radu Marinescu'],
          'grad': ['Sl', 'Sl', 'Prof', 'Prof'],
          'birou': ['A236', 'B123', 'C314', 'A210'],
          'varsta': [43, 35, 56, 62],
          'departament': ['COM', 'MEO', 'EA', 'COM']
         },
    index=pd.date_range(
        start='1/10/2010',
        end='1/10/2014',
        periods=4,
        name='data_angajarii'
    )
)

In [66]:
print(tutori)

                          nume  grad birou  varsta departament
data_angajarii                                                
2010-01-10        Adrian Cosma    Sl  A236      43         COM
2011-05-12       Mihai Popescu    Sl  B123      35         MEO
2012-09-10         Florin Vlad  Prof  C314      56          EA
2014-01-10      Radu Marinescu  Prof  A210      62         COM


In [67]:
tutori.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4 entries, 2010-01-10 to 2014-01-10
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   nume         4 non-null      object
 1   grad         4 non-null      object
 2   birou        4 non-null      object
 3   varsta       4 non-null      int64 
 4   departament  4 non-null      object
dtypes: int64(1), object(4)
memory usage: 192.0+ bytes


In [68]:
orase =  pd.DataFrame(
    ['Timisoara', 'Cluj-Napoca', 'Bucuresti', 'Iasi', 'Constanta', 'Brasov', 'Anina'],
    columns=['nume'])

In [69]:
orase

Unnamed: 0,nume
0,Timisoara
1,Cluj-Napoca
2,Bucuresti
3,Iasi
4,Constanta
5,Brasov
6,Anina


In [70]:
orase =  pd.DataFrame([
    {'nume': 'Timisoara', 'judet': 'Timis'},
    {'nume': 'Cluj-Napoca'},
    {'nume': 'Bucuresti'},
    {'nume': 'Iasi', 'judet': 'Iasi'},
    {'nume': 'Constanta', 'judet': 'Constanta'},
    {'nume': 'Brasov', 'judet': 'Brasov'},
    {'nume': 'Anina', 'judet': 'Caras-Severin'}
])

In [71]:
orase

Unnamed: 0,nume,judet
0,Timisoara,Timis
1,Cluj-Napoca,
2,Bucuresti,
3,Iasi,Iasi
4,Constanta,Constanta
5,Brasov,Brasov
6,Anina,Caras-Severin


In [72]:
title = pd.read_csv("../data/netflix_titles.csv", sep=",", usecols=['title']).squeeze("columns")

In [73]:
director = df['director']

In [74]:
df2 = pd.DataFrame({
    'title': title,
    'director': director,
    'imdb': np.nan,
    'views': np.random.randint(1, 1000000, size=len(df.index))})

In [75]:
print(df2)

                      title         director  imdb   views
0      Dick Johnson Is Dead  Kirsten Johnson   NaN  655091
1             Blood & Water              NaN   NaN     290
2                 Ganglands  Julien Leclercq   NaN  437529
3     Jailbirds New Orleans              NaN   NaN  188782
4              Kota Factory              NaN   NaN  566950
...                     ...              ...   ...     ...
8802                 Zodiac    David Fincher   NaN  507898
8803            Zombie Dumb              NaN   NaN  603817
8804             Zombieland  Ruben Fleischer   NaN    5335
8805                   Zoom     Peter Hewitt   NaN  490185
8806                 Zubaan      Mozez Singh   NaN  364619

[8807 rows x 4 columns]


In [76]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   title     8807 non-null   object 
 1   director  6173 non-null   object 
 2   imdb      0 non-null      float64
 3   views     8807 non-null   int32  
dtypes: float64(1), int32(1), object(2)
memory usage: 240.9+ KB


In [77]:
df2.index

RangeIndex(start=0, stop=8807, step=1)

In [78]:
df2.columns

Index(['title', 'director', 'imdb', 'views'], dtype='object')

In [79]:
df2.values

array([['Dick Johnson Is Dead', 'Kirsten Johnson', nan, 655091],
       ['Blood & Water', nan, nan, 290],
       ['Ganglands', 'Julien Leclercq', nan, 437529],
       ...,
       ['Zombieland', 'Ruben Fleischer', nan, 5335],
       ['Zoom', 'Peter Hewitt', nan, 490185],
       ['Zubaan', 'Mozez Singh', nan, 364619]],
      shape=(8807, 4), dtype=object)

In [80]:
title1 = df['title']

In [81]:
print(title1.head())

0     Dick Johnson Is Dead
1            Blood & Water
2                Ganglands
3    Jailbirds New Orleans
4             Kota Factory
Name: title, dtype: object


In [82]:
type(title1)

pandas.core.series.Series

In [83]:
title2 = df[['title']]

In [84]:
print(title2.head())

                   title
0   Dick Johnson Is Dead
1          Blood & Water
2              Ganglands
3  Jailbirds New Orleans
4           Kota Factory


In [85]:
type(title2)

pandas.core.frame.DataFrame

In [86]:
print(df[['title', 'director']])

                      title         director
0      Dick Johnson Is Dead  Kirsten Johnson
1             Blood & Water              NaN
2                 Ganglands  Julien Leclercq
3     Jailbirds New Orleans              NaN
4              Kota Factory              NaN
...                     ...              ...
8802                 Zodiac    David Fincher
8803            Zombie Dumb              NaN
8804             Zombieland  Ruben Fleischer
8805                   Zoom     Peter Hewitt
8806                 Zubaan      Mozez Singh

[8807 rows x 2 columns]


In [87]:
tutori, \
type(tutori)

(                          nume  grad birou  varsta departament
 data_angajarii                                                
 2010-01-10        Adrian Cosma    Sl  A236      43         COM
 2011-05-12       Mihai Popescu    Sl  B123      35         MEO
 2012-09-10         Florin Vlad  Prof  C314      56          EA
 2014-01-10      Radu Marinescu  Prof  A210      62         COM,
 pandas.core.frame.DataFrame)

In [88]:
tutori.loc['2012-09-10', 'nume'], \
type(tutori.loc['2012-09-10', 'nume'])

('Florin Vlad', str)

In [89]:
tutori.loc['2010-01-10', 'nume'], \
type(tutori.loc['2010-01-10', 'nume'])

('Adrian Cosma', str)

In [90]:
tutori.loc[['2010-01-10','2012-09-10'], 'nume'], \
type(tutori.loc[['2010-01-10','2012-09-10'], 'nume'])

(data_angajarii
 2010-01-10    Adrian Cosma
 2012-09-10     Florin Vlad
 Name: nume, dtype: object,
 pandas.core.series.Series)

In [91]:
tutori.loc['2010-01-10':'2012-09-10', 'nume'], \
type(tutori.loc['2010-01-10':'2012-09-10', 'nume'])

(data_angajarii
 2010-01-10     Adrian Cosma
 2011-05-12    Mihai Popescu
 2012-09-10      Florin Vlad
 Name: nume, dtype: object,
 pandas.core.series.Series)

In [92]:
tutori.loc['2010-01-10':'2012-09-10', ['nume', 'birou']], \
type(tutori.loc['2010-01-10':'2012-09-10', ['nume', 'birou']])

(                         nume birou
 data_angajarii                     
 2010-01-10       Adrian Cosma  A236
 2011-05-12      Mihai Popescu  B123
 2012-09-10        Florin Vlad  C314,
 pandas.core.frame.DataFrame)

In [93]:
tutori.loc['2010-01-10':'2012-09-10', 'nume':'birou'], \
type(tutori.loc['2010-01-10':'2012-09-10', 'nume':'birou'])

(                         nume  grad birou
 data_angajarii                           
 2010-01-10       Adrian Cosma    Sl  A236
 2011-05-12      Mihai Popescu    Sl  B123
 2012-09-10        Florin Vlad  Prof  C314,
 pandas.core.frame.DataFrame)

In [94]:
tutori.loc[:, 'nume':'birou'], \
type(tutori.loc[:, 'nume':'birou'])

(                          nume  grad birou
 data_angajarii                            
 2010-01-10        Adrian Cosma    Sl  A236
 2011-05-12       Mihai Popescu    Sl  B123
 2012-09-10         Florin Vlad  Prof  C314
 2014-01-10      Radu Marinescu  Prof  A210,
 pandas.core.frame.DataFrame)

In [95]:
tutori.iloc[0, 0], \
type(tutori.iloc[0, 0])

('Adrian Cosma', str)

In [96]:
tutori.iloc[[0, 2], 0], \
type(tutori.iloc[[0, 2], 0])

(data_angajarii
 2010-01-10    Adrian Cosma
 2012-09-10     Florin Vlad
 Name: nume, dtype: object,
 pandas.core.series.Series)

In [97]:
tutori.iloc[0:3, 0], \
type(tutori.iloc[0:3, 0])

(data_angajarii
 2010-01-10     Adrian Cosma
 2011-05-12    Mihai Popescu
 2012-09-10      Florin Vlad
 Name: nume, dtype: object,
 pandas.core.series.Series)

In [98]:
tutori.iloc[0:3, [0, 2]], \
type(tutori.iloc[0:3, [0, 2]])

(                         nume birou
 data_angajarii                     
 2010-01-10       Adrian Cosma  A236
 2011-05-12      Mihai Popescu  B123
 2012-09-10        Florin Vlad  C314,
 pandas.core.frame.DataFrame)

In [99]:
tutori.iloc[:, 0:3], \
type(tutori.iloc[:, 0:3])

(                          nume  grad birou
 data_angajarii                            
 2010-01-10        Adrian Cosma    Sl  A236
 2011-05-12       Mihai Popescu    Sl  B123
 2012-09-10         Florin Vlad  Prof  C314
 2014-01-10      Radu Marinescu  Prof  A210,
 pandas.core.frame.DataFrame)

In [100]:
orase, \
orase.shape

(          nume          judet
 0    Timisoara          Timis
 1  Cluj-Napoca            NaN
 2    Bucuresti            NaN
 3         Iasi           Iasi
 4    Constanta      Constanta
 5       Brasov         Brasov
 6        Anina  Caras-Severin,
 (7, 2))

In [101]:
orase['populatie'] = [329003, 324267, 2121794, 376180, 313931, 289646, 9172]
orase['suprafata'] = np.nan

In [102]:
orase, \
orase.shape

(          nume          judet  populatie  suprafata
 0    Timisoara          Timis     329003        NaN
 1  Cluj-Napoca            NaN     324267        NaN
 2    Bucuresti            NaN    2121794        NaN
 3         Iasi           Iasi     376180        NaN
 4    Constanta      Constanta     313931        NaN
 5       Brasov         Brasov     289646        NaN
 6        Anina  Caras-Severin       9172        NaN,
 (7, 4))

In [None]:
# adaugare inregistrare noua
orase_noi = pd.DataFrame([{'nume': 'Ploiesti', 'populatie': 228550, 'judet': 'Prahova'}])

# utilizeaza pd.concat() pentru a adauga noua inregistrare in structura de date existenta
orase = pd.concat([orase, orase_noi], ignore_index=True)

In [104]:
orase, \
orase.shape

(          nume          judet  populatie  suprafata
 0    Timisoara          Timis     329003        NaN
 1  Cluj-Napoca            NaN     324267        NaN
 2    Bucuresti            NaN    2121794        NaN
 3         Iasi           Iasi     376180        NaN
 4    Constanta      Constanta     313931        NaN
 5       Brasov         Brasov     289646        NaN
 6        Anina  Caras-Severin       9172        NaN
 7     Ploiesti        Prahova     228550        NaN,
 (8, 4))

In [105]:
df.index

RangeIndex(start=0, stop=8807, step=1)

In [106]:
tutori.index

DatetimeIndex(['2010-01-10', '2011-05-12', '2012-09-10', '2014-01-10'], dtype='datetime64[ns]', name='data_angajarii', freq=None)

In [107]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

In [108]:
tutori.dtypes

nume           object
grad           object
birou          object
varsta          int64
departament    object
dtype: object

In [109]:
type(tutori.info)

method

In [110]:
type(tutori.info())

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4 entries, 2010-01-10 to 2014-01-10
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   nume         4 non-null      object
 1   grad         4 non-null      object
 2   birou        4 non-null      object
 3   varsta       4 non-null      int64 
 4   departament  4 non-null      object
dtypes: int64(1), object(4)
memory usage: 364.0+ bytes


NoneType

In [111]:
tutori.values

array([['Adrian Cosma', 'Sl', 'A236', 43, 'COM'],
       ['Mihai Popescu', 'Sl', 'B123', 35, 'MEO'],
       ['Florin Vlad', 'Prof', 'C314', 56, 'EA'],
       ['Radu Marinescu', 'Prof', 'A210', 62, 'COM']], dtype=object)

In [112]:
df.size

105684

In [113]:
df.shape

(8807, 12)

In [114]:
df.axes

[RangeIndex(start=0, stop=8807, step=1),
 Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
        'release_year', 'rating', 'duration', 'listed_in', 'description'],
       dtype='object')]

In [115]:
df.ndim

2

In [116]:
orase

Unnamed: 0,nume,judet,populatie,suprafata
0,Timisoara,Timis,329003,
1,Cluj-Napoca,,324267,
2,Bucuresti,,2121794,
3,Iasi,Iasi,376180,
4,Constanta,Constanta,313931,
5,Brasov,Brasov,289646,
6,Anina,Caras-Severin,9172,
7,Ploiesti,Prahova,228550,


In [117]:
orase.iloc(0)

<pandas.core.indexing._iLocIndexer at 0x18aee938cd0>