# Series and Columns

In [32]:
import pandas as pd
houses = pd.read_csv("../data/kc_house_data.csv")
titanic = pd.read_csv("../data/titanic.csv")
netflix = pd.read_csv("../data/netflix_titles.csv", sep="|", index_col=0)

## Selecting a single comun
We have two sintaxes we can use to select columns:
- **df.column_name**
- **df["column_name"]** - this is the best option to avoid misunderstandings by python

In [33]:
titanic.name

0                         Allen, Miss. Elisabeth Walton
1                        Allison, Master. Hudson Trevor
2                          Allison, Miss. Helen Loraine
3                  Allison, Mr. Hudson Joshua Creighton
4       Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
                             ...                       
1304                               Zabour, Miss. Hileni
1305                              Zabour, Miss. Thamine
1306                          Zakarian, Mr. Mapriededer
1307                                Zakarian, Mr. Ortin
1308                                 Zimmerman, Mr. Leo
Name: name, Length: 1309, dtype: object

In [34]:
titanic["name"] # this is the best option to avoid misunderstandings by python

0                         Allen, Miss. Elisabeth Walton
1                        Allison, Master. Hudson Trevor
2                          Allison, Miss. Helen Loraine
3                  Allison, Mr. Hudson Joshua Creighton
4       Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
                             ...                       
1304                               Zabour, Miss. Hileni
1305                              Zabour, Miss. Thamine
1306                          Zakarian, Mr. Mapriededer
1307                                Zakarian, Mr. Ortin
1308                                 Zimmerman, Mr. Leo
Name: name, Length: 1309, dtype: object

## Series
A Series is a one-dimensional labeled array in pandas, which can hold various data types (such as integers, floats, or strings). It can be thought of as a single column in a DataFrame, but it is independent and has its own index, which labels each data point.

### Key points:
- **Index**: Each element in a Series has a corresponding index that uniquely identifies it. This index can be either a default numerical index (0, 1, 2, ...) or a custom label, depending on how the Series is created.
    Example: titanic.name - it returns a Series where the index corresponds to the row numbers, and the data contains names from the Titanic dataset.

### Methods in a Series
You can apply various aggregation methods like **min()**, **max()**, **sum()**, and **count()** directly on a Series to analyze its data.
    
Example: houses.mean(numeric_only=True) calculates the mean value of a Series containing house prices, excluding non-numeric data.

### Common attributes and methods for Series:
- **series.shape**: Returns a tuple representing the dimensions of the Series. For a one-dimensional Series, it returns (number_of_rows,).
    Example: series.shape could return (100,), meaning the Series has 100 data points.
- **series.values**: Provides the underlying data of the Series as a NumPy array.
    Example: series.values will return the actual data without the index labels.
- **series.index**: Returns the index (labels) of the Series. This helps to understand what labels are associated with each data point.
    Example: series.index might return a range of integers or a custom set of labels, depending on how the Series was created.

In [35]:
titanic.name

0                         Allen, Miss. Elisabeth Walton
1                        Allison, Master. Hudson Trevor
2                          Allison, Miss. Helen Loraine
3                  Allison, Mr. Hudson Joshua Creighton
4       Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
                             ...                       
1304                               Zabour, Miss. Hileni
1305                              Zabour, Miss. Thamine
1306                          Zakarian, Mr. Mapriededer
1307                                Zakarian, Mr. Ortin
1308                                 Zimmerman, Mr. Leo
Name: name, Length: 1309, dtype: object

In [36]:
houses.mean(numeric_only=True)

id               4.580302e+09
price            5.400881e+05
bedrooms         3.370842e+00
bathrooms        2.114757e+00
sqft_living      2.079900e+03
sqft_lot         1.510697e+04
floors           1.494309e+00
waterfront       7.541757e-03
view             2.343034e-01
condition        3.409430e+00
grade            7.656873e+00
sqft_above       1.788391e+03
sqft_basement    2.915090e+02
yr_built         1.971005e+03
yr_renovated     8.440226e+01
zipcode          9.807794e+04
lat              4.756005e+01
long            -1.222139e+02
sqft_living15    1.986552e+03
sqft_lot15       1.276846e+04
dtype: float64

In [37]:
prices = houses.price
prices.sum(numeric_only=True)

np.float64(11672925008.0)

In [38]:
prices.shape

(21613,)

In [39]:
prices.values

array([221900., 538000., 180000., ..., 402101., 400000., 325000.],
      shape=(21613,))

In [40]:
prices.index

RangeIndex(start=0, stop=21613, step=1)

## Important series methods
- head()
- tail()
- describe()
- unique()  ++++++++++
- nunique()
- nlargest()
- nsmallest()
- value_counts()
- plot()

In [42]:
age = titanic.age
age.head()

0        29
1    0.9167
2         2
3        30
4        25
Name: age, dtype: object

In [43]:
age.tail

<bound method NDFrame.tail of 0           29
1       0.9167
2            2
3           30
4           25
         ...  
1304      14.5
1305         ?
1306      26.5
1307        27
1308        29
Name: age, Length: 1309, dtype: object>

In [44]:
age.describe()

count     1309
unique      99
top          ?
freq       263
Name: age, dtype: object