# Datenaufbereitung in Tabellenform

We often have vectors of related information. If the variables contain information about a set of observations and can be arranged as a table, then we call the data tabular.
Here you will learn when and how to organize related vectors of data in a table.

## Tabellen erstellen

The variables can be different data types, but must have the same number of rows, here 4. 

Bild 1

To combine the data into a table, each entry should correspond to the same observation. Here the third entry of each variable identifies the 2018 Honda Odyssey, which is a minivan. 
Sometimes you may have tables you want to combine. For instance, you may want to add new cars to your dataset. 

Bild 2

It would also be useful to attach numeric data about the cars in the fleet. 

Bild 3



When several variables in your workspace hold data about the same observations, you can use the table function to collect this information into a table. 

eklärendes Bild wie Bild 4

Use the 'VariableNames' property to specify column names. So the following command creates a table with columns named Thought1 and OnSecondThought.

  tbl = table(var1,var2, 'VariableNames',{'Thought1','OnSecondThought'})
    


In this exercise, you will combine information on several 2018 cars into a single table. Wir haben hier genau so einen Datensatz wie in Bild 1 dargestellt, mit exakt 4 Einträgen.

In [None]:
# Einbinden des Pakets numpy unter dem Alias 'np'
load carInfo.mat
whos

:::{admonition} Aufgabe 1.1
Create a table named fleet containing the data in variables mk, md, yr, tp.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

Use the table function.  The following syntax creates a table from variables a, b, and c:

table(a,b,c) 
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
fleet= table(mk,md,yr,tp);
```
:::

One of the advantages of tables is that each column is a named variable so you and others can understand quickly how the dataset is organized and access the data efficiently.  To leverage this advantage, specify descriptive variable names.

:::{admonition} Aufgabe 1.2
Again, create a table named fleet containing the data in variables mk, md, yr, and tp.  This time give the variables the names:

    Make
    Model
    Year
    Type
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

Use the VariableNames name-value pair argument to specify the desired variable names as a cell array of character vectors.
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
fleet = table(mk,md,yr,tp,'VariableNames',{'Make' 'Model' 'Year' 'Type'})
```
:::

You have created a table containing information about cars in a fleet.

You can view the properties of the table, including the variable names, by calling the properties with dot notation.

fleet.Properties

:::{admonition} Aufgabe 1.3

:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

:::

:::{admonition} Lösung
:class: tip dropdown

``` python

```
:::

<br>

## Existierende Tabellen zusammenfügen

Sometimes information is spread across multiple tables. Vertical and horizontal concatenation work the same way as they do with matrices as long as the concatenation is compatible.

Vertical Concatenation
For vertical concatenation, each column must have the same name and compatible data types. 
Bild 5

Horizontal Concatenation
Each table must have the same number of rows, and each row should correspond to the same observation for horizontal concatenation to make sense
Bild 6
These tables do not have the same number of rows. 
The rows in these tables do not correspond to the same observation. The code would run but the resulting table would not make sense. 
Horizontal concatenation makes sense for these tables, but you may want to remove duplicate information after concatenating. 
Bild 7


Tables may contain information about the same observations, but the rows may be sorted differently. Horizontal concatenation would not cause a syntax error, but may result in mismatched data or columns with redundant information. 
Bild 8
Here we want to add the weight, length, and width of each car to the corresponding row in fleet.

You can join tables together with the join function to match corresponding rows using key variables.

>> newTbl
 = join(tbl1,tbl2,"Keys",["Var1" "Var2"])
Bild 9

The redundant columns of tbl2 are not included in newTbl.  

In this exercise, you will add 2019 cars to the existing 2018 fleet. Ich würde sagen, wir machen hier die gleichen Daten wie in den Bildern angezeigt.

In [None]:
# Einbinden des Pakets numpy unter dem Alias 'np'
load fleet.mat
load fleet2019.mat
load sizeData.mat
whos

:::{admonition} Aufgabe 2.1
Update the variable fleet so that it contains the cars from the original variable fleet on top of cars from the variable fleet2019.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

Use square brackets to concatenate tables just as you would for numeric arrays.  
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
fleet = [fleet; fleet2019]
```
:::

You would like to append numeric data on the vehicles in your fleet to your table, but the cars in the new set are listed in a different order

In [None]:
# Einbinden des Pakets numpy unter dem Alias 'np'
import numpy as np 

:::{admonition} Aufgabe 2.2
Update the variable fleet so that it includes the numeric data in sizeData so that the weights are associated to the correct cars and there are no redundant columns.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

Use the join function with key variables Make and Model. 
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
fleet = join(fleet,sizeData,"Keys",["Make" "Model"])
```
:::

:::{admonition} Aufgabe 2.3
Complete! What is the difference in switching the order of fleet and sizeData in Task 2? Try changing the order of inputs in your code.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

:::

:::{admonition} Lösung
:class: tip dropdown

``` python

```
:::

<br>

## Tabellendaten auslesen

You will sometimes need to do analysis on a portion of your data. There are two ways to extract data from a table:

    Extract a portion of the original table as a table itself.
    Extract the contents of a table. For instance if you have a column of numeric values, you can extract a numeric array.

The best choice depends on if you need the structure of a table or a homogeneous data type.
Bild 10

Most tables contain more data than you might need for your particular analysis. You may want to take a smaller piece of your data to remove the extraneous information.
You can index into a table using array indexing with parentheses. Columns can be specified either numerically or by name. 
Bild 11
As with any array indexing, the result is always the same data type as the original data. 



You can index into a table using the numerical index of the rows you want. Dieser Datensatz ist wieder ähnlich (und wahrscheinlich macht es Sinn immer mit dem gleichen Datensatz zu arbeiten). Wir Haben Spalten Make (das Unternehmen, welches die Autos gemacht hat), Model, Jahr, Typ (SUV, minivan, compact), weight, length and width.

In [None]:
# Einbinden des Pakets numpy unter dem Alias 'np'
load fleet.mat
fleet

:::{admonition} Aufgabe 3.1
Create a variable named smallCars that contains the last four columns of fleet for the cars in row 1 and rows 4 to the end.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

The end keyword can be used to get the last rows or columns when indexing.
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
smallCars = fleet([1 4:end],end-3:end)
```
:::

You can specify specific columns by name using a string array.

:::{admonition} Aufgabe 3.2
Create a table named sizes that contains the columns of fleet named Weight, Length, and Width.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

You can use the : operator to extract an entire row or column.
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
sizes = fleet(:,["Weight", "Length", "Width"])
```
:::

Text

:::{admonition} Aufgabe 3.3
Try creating a table for all of the Toyota cars.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

:::

:::{admonition} Lösung
:class: tip dropdown

``` python

```
:::

Extracting from a Single Column Using Dot Notation
To perform analysis on the data contained in a table, you'll need to extract the contents from the table before passing it to another function, such as plot.
You can index using dot notation with the variable name to extract the contents from a table variable.
   variableData = tableName.VariableName

Bild 12
The result from dot indexing has the data type of the underlying data.


Extracting from Multiple Columns Using Curly Braces
You can also index using curly brackets to extract the contents of multiple columns. You specify the rows and columns using numerical or named indexing.

   variableData = tableName{rowIndices,colIndices}
The variable rowIndices is a numeric vector. colIndices can be either a numeric vector or string array specifying column names. You can also use logical vectors to specify rows and columns. 

Bild 13
The result from dot indexing has the data type of the underlying data.
When extracting from multiple columns, you must be careful to only extract columns with compatible data types. 

 

In [None]:
# Einbinden des Pakets numpy unter dem Alias 'np'
load fleet.mat
fleet

:::{admonition} Aufgabe 3.4
Create a string variable called models containing the Model column of fleet.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

You can use . indexing to specify the name of the column you want.
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
models = fleet.Model
```
:::

Text

:::{admonition} Aufgabe 3.5
Create a numeric variable called w containing the Weight column of fleet.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

You can use . indexing to specify the name of the column you want.
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
w = fleet.Weight
```
:::

You can also use . notation to update data in a table using the syntax:

tbl.Var = updatedColumn
tbl.Var(2) = updatedValue

:::{admonition} Aufgabe 3.6
Convert the data in the Weight column from kilograms to pounds. One kilogram is equal to 2.2046 pounds.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown
You can use . indexing to specify the Weight column. Assign the new weights to that column. (The new values should be a little over twice the original values.)
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
fleet.Weight = w*2.2046
```
:::

You can also use . notation to create a new column in a table using the syntax:

tbl.newVar = newColumn

:::{admonition} Aufgabe 3.7
Add a new column called Area to the table fleet by multiplying each element in the column Length by the corresponding element in the column Width
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

Use elementwise multiplication to multiply each element of Length by the corresponding element in the column Width.
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
fleet.Area = fleet.Length.*fleet.Width
```
:::

You can use curly brackets to extract data from multiple columns.

:::{admonition} Aufgabe 3.8
Create a double array called data containing all of the numeric physical properties from the table fleet. Then find the average of each numeric variable using the mean function. Save this to the variable m.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

The physical properties are in the Weight, Length, Width, and Area columns. The mean function will return the average of each column of a numeric array.
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
data = fleet{:,end-3:end}
m = mean(data)
```
:::

Once a column is extracted, you can use it to make logical comparisons. The resulting logical vector can be used to subset the original table.

:::{admonition} Aufgabe 3.9
Create a logical vector named idx identifying each of the compact cars. Use idx to create a string array called smallCars containing the makes and models of compact cars.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

Create idx by comparing each element of Type to "compact".
:::

:::{admonition} Lösung
:class: tip dropdown

``` python
idx = fleet.Type == "compact"
smallCars = fleet{idx,["Make" "Model"]}
```
:::

The vartype function allows you to select all columns of a specific data type. For instance,

T{:,vartype("categorical")}

extracts data from all of the categorical columns and outputs a categorical array.

:::{admonition} Aufgabe 3.10
Try extracting all of the variables of type string and pass that to the join function to create a list of cars.
:::

In [None]:
# Ihr Code 


:::{admonition} Hinweis
:class: note dropdown

:::

:::{admonition} Lösung
:class: tip dropdown

``` python

```
:::