# Data Manipulation: Basics

I check some functions that could be useful, but not necessarily.

In [1]:
import pandas as pd
import numpy as np

In [22]:
data = {
     'name': ['Xavier', 'Ann', 'Jana', 'Yi', 'Robin', 'Amal', 'Nori'],
     'city': ['Mexico City', 'Toronto', 'Prague', 'Shanghai',
              'Manchester', 'Cairo', 'Osaka'],
     'age': [41, 28, 33, 34, 38, 31, 37],
     'py-score': [88.0, 79.0, 81.0, 80.0, 68.0, 61.0, 84.0]
 }

row_labels = [101, 102, 103, 104, 105, 106, 107]
df = pd.DataFrame(data=data, index=row_labels)

df

Unnamed: 0,name,city,age,py-score
101,Xavier,Mexico City,41,88.0
102,Ann,Toronto,28,79.0
103,Jana,Prague,33,81.0
104,Yi,Shanghai,34,80.0
105,Robin,Manchester,38,68.0
106,Amal,Cairo,31,61.0
107,Nori,Osaka,37,84.0


## Labels, Types and Sizes

Less important functions, but could be useful in some cases.

#### Labels as sequences

You can get the labels as sequences using `df.index` or `df.columns`. You can
change the whole sequence (with a entire replacement), but you can not change
the values one by one.

In [23]:
df.index

Index([101, 102, 103, 104, 105, 106, 107], dtype='int64')

In [24]:
df.columns

Index(['name', 'city', 'age', 'py-score'], dtype='object')

In [28]:
df.index = np.arange(10,17)
df.index

Index([10, 11, 12, 13, 14, 15, 16], dtype='int64')

In [14]:
df.columns[0] = "different" # error

TypeError: Index does not support mutable operations

#### Data Types and Sizes

You have the following handy functions which can help you:

- `df.dtypes` returns a Series with the data type of each column
- `df.astype()` helps us to change the data type.
- `df.ndim` return the dimensions. DataFrame -> 2 and Series -> 1.
- `df.shape` returns a tuple with the number of values per dimension
- `df.size` returns the total number of dimensions.

In [29]:
df.dtypes

name         object
city         object
age           int64
py-score    float64
dtype: object

In [30]:
df = df.astype(dtype = {'py-score' : np.float32})
df.dtypes

name         object
city         object
age           int64
py-score    float32
dtype: object

In [31]:
print(df.ndim)
print(df.shape)
print(df.size)

2
(7, 4)
28


## Indexing and Slicing



when you need only a single value, pandas recommends using the specialized accessors .at[] and .iat[]:

The reason you only get indices 1 through 5 is that, with .iloc[], the stop index of a slice is exclusive. however, both start and stop indices are inclusive

You can skip rows and columns with .iloc[] with a step parameter included 1:6:2
1 al 6 con paso 2

Note: Don’t use tuples instead of lists or integer arrays to get ordinary rows or columns. Tuples are reserved for representing multiple dimensions in NumPy and pandas, as well as hierarchical, or multi-level, indexing in pandas.