# Setup

In [None]:
from datetime import datetime
import pandas as pd

df = pd.DataFrame({
    'name': ["Tom", "Lisa", "Peter"],
    'height': [1.68, 1.93, 1.72],
    'weight': [48.4, 89.8, 84.2],
    'id': [1, 2, 3],
    'city': ['Stuttgart', 'Stuttgart', 'Berlin']
})

# Basics

## Data Types with .dtypes

In [None]:
df.dtypes

## Data Types with .info()

In [None]:
df.info()

# Change Data Types

## Standard methods

- There are several methods to [change data types in pandas](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html):

- The most common method to change the data type is:

- `.astype()`: Convert to a specific type (like "`int32`", "`float`" or "`catgeory`")

- `.astype(str)`: Convert to string  
  

## More options
  
- `to_datetime`: Convert argument to datetime.
- `to_timedelta`: Convert argument to timedelta.
- `to_numeric`: Convert argument to a numeric type.


# Categorical Data and Strings

## What is categorical data?

- Categoricals are a pandas data type corresponding to categorical variables in statistics. 

- A categorical variable takes on a limited, and usually fixed, number of possible values (categories). 

- Examples are gender, social class, blood type, country affiliation, observation time or rating via Likert scales.

## Convert to categorical data


- Convert variable "name" to a category dtype:



In [None]:
df["name"] = df["name"].astype("category")

In [None]:
df.info()

## String data

- In our example, `id` is not a number (we can't perform calculations with it)

- It is just a unique identifier so we should transform it to a simple string (object)



In [None]:
df['id'] = df['id'].astype(str)

In [None]:
df.info()

# Add new columns

## Add a constant number

- Add a new variable called "number" to df 

- The new variable should have the number 42 in all rows



In [None]:
df["number"] = 42

In [None]:
df.head()

## Add from existing columns

 - Create new columns from existing columns




In [None]:
# calculate body mass index
df['bmi'] = round(df['weight'] / (df['height'] * df['height']), 2)

In [None]:
df

# Add Dates

## Add a date with strftime

- To add a date, we can use datetime and [strftime](https://strftime.org) (see code examples on the next slides):



In [None]:
df["date"] = datetime.today().strftime('%Y-%m-%d')

In [None]:
df.head(3)