## Check the setup and connect to the database

In [None]:
%run "010-check_setup.ipynb"

## Use HANA DataFrame and Pandas DataFrame

List database tables from the schema `TITANIC` using [the `get_tables()` method](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.dataframe.html#hana_ml.dataframe.ConnectionContext.get_tables).

In [None]:
myconn.get_tables(schema='TITANIC')

A table with data already exist in your SAP HANA database, so you use [the `table()` method](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.dataframe.html#hana_ml.dataframe.ConnectionContext.table) to instantiate a HANA DataFrame from the existing database table. 

In [None]:
hdf_train=myconn.table('DATA_LABELED', schema='TITANIC')

You can always use [the `select_statement` property](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame) to check an SQL SELECT statement that backs a HANA DataFrame. 

In [None]:
hdf_train.select_statement

The [**HANA DataFrame** object in Python](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.html#sap-hana-dataframe) represents only an SQL SELECT statement, but does not store data...

In [None]:
hdf_train_first10recs=hdf_train.head(10)

In [None]:
hdf_train_first10recs.select_statement

...until [a `collect()` method](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame.collect) is executed, which returns a result as a Pandas dataframe on a client side

In [None]:
hdf_train_first10recs.collect()

You use [HANA `DataFrame` methods](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame) to query the data from SAP HANA database.

In [None]:
print(hdf_train.value_counts(['PClass']).select_statement)

In [None]:
hdf_train.value_counts(['PClass']).collect()

In [None]:
print(hdf_train.value_counts(['PClass']).sort('NUM_PClass', desc=True).select_statement)

In [None]:
hdf_train.value_counts(['PClass']).sort('NUM_PClass', desc=True).collect()

You use [**Pandas `DataFrame` and/or `Series`**](https://pandas.pydata.org/docs/user_guide/10min.html#minutes-to-pandas) methods to query the data returned to a client as a result of the `collect()` method.

In [None]:
hdf_train.value_counts(['PClass']).collect().sort_values('NUM_PClass')

🤓 **Let's discuss**:
1. HANA DataFrames
2. Pandas DataFrames/Series