# Exercises Part I

Make a file named pandas_series.py or pandas_series.ipynb for the following exercises.

Use pandas to create a Series named fruits from the following list:

```
["kiwi", "mango", "strawberry", "pineapple", "gala apple", "honeycrisp apple", "tomato", "watermelon", "honeydew", "kiwi", "kiwi", "kiwi", "mango", "blueberry", "blackberry", "gooseberry", "papaya"]
```

In [18]:
import pandas as pd
import numpy as np
from pydataset import data

In [32]:
fruit_list = ["kiwi", "mango", "strawberry", "pineapple", "gala_apple", "honeycrisp_apple", "tomato", "watermelon", "honeydew", "kiwi", "kiwi", "kiwi", "mango", "blueberry", "blackberry", "gooseberry", "papaya"]

fruit_series = pd.Series(fruit_list)

fruit_series

0                 kiwi
1                mango
2           strawberry
3            pineapple
4           gala_apple
5     honeycrisp_apple
6               tomato
7           watermelon
8             honeydew
9                 kiwi
10                kiwi
11                kiwi
12               mango
13           blueberry
14          blackberry
15          gooseberry
16              papaya
dtype: object

In [26]:
# Determine the number of elements in fruits.

fruit_series.size

17

In [31]:
# Output only index column from fruits.

fruit_series.index

RangeIndex(start=0, stop=17, step=1)

In [34]:
# Output only the values from fruits.

fruit_series.values

array(['kiwi', 'mango', 'strawberry', 'pineapple', 'gala_apple',
       'honeycrisp_apple', 'tomato', 'watermelon', 'honeydew', 'kiwi',
       'kiwi', 'kiwi', 'mango', 'blueberry', 'blackberry', 'gooseberry',
       'papaya'], dtype=object)

In [38]:
# Confirm the data type of the values in fruits.

fruit_series.dtype

dtype('O')

In [58]:
# Output only the first five values from fruits.
print(fruit_series.head())

0          kiwi
1         mango
2    strawberry
3     pineapple
4    gala_apple
dtype: object


In [59]:
# Output the last three values.

print(fruit_series.tail(3))

14    blackberry
15    gooseberry
16        papaya
dtype: object


In [60]:
# Output two random values from fruits.

print(fruit_series.sample(2))

9                kiwi
5    honeycrisp_apple
dtype: object


In [47]:
# Run the .describe() on fruits to see what information it returns when called on a Series with string values.

fruit_series.describe()

count       17
unique      13
top       kiwi
freq         4
dtype: object

In [49]:
# Run the code necessary to produce only the unique string values from fruits.

fruit_series.unique()

array(['kiwi', 'mango', 'strawberry', 'pineapple', 'gala_apple',
       'honeycrisp_apple', 'tomato', 'watermelon', 'honeydew',
       'blueberry', 'blackberry', 'gooseberry', 'papaya'], dtype=object)

In [50]:
# Determine how many times each unique string value occurs in fruits.

fruit_series.value_counts()

kiwi                4
mango               2
strawberry          1
pineapple           1
gala_apple          1
honeycrisp_apple    1
tomato              1
watermelon          1
honeydew            1
blueberry           1
blackberry          1
gooseberry          1
papaya              1
dtype: int64

In [51]:
# Determine the string value that occurs most frequently in fruits.

fruit_series.value_counts().head(1)

kiwi    4
dtype: int64

In [57]:
# Determine the string value that occurs least frequently in fruits.

fruit_series.value_counts().tail(1)

# If there are multiple counts at the lowest print them all

fruit_series.value_counts().nsmallest(n=1, keep='all')

strawberry          1
pineapple           1
gala_apple          1
honeycrisp_apple    1
tomato              1
watermelon          1
honeydew            1
blueberry           1
blackberry          1
gooseberry          1
papaya              1
dtype: int64

# Pandas Workbook

In [1]:
import pandas as pd
import numpy as np
from pydataset import data

In [3]:
my_df = data('sleepstudy')

In [16]:
my_series = my_df['Reaction']

#It is a series, because it has 1 column (+ a label)

my_series

1      249.5600
2      258.7047
3      250.8006
4      321.4398
5      356.8519
         ...   
176    329.6076
177    334.4818
178    343.2199
179    369.1417
180    364.1236
Name: Reaction, Length: 180, dtype: float64

In [10]:
my_list = list(range(1, 11))

list_as_series = pd.Series(my_list)

list_as_series

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64

In [12]:
# Creating a pandas series from a dictionary

my_dict = {'a': 1, 'b': 2, 'c': 3}

my_dict_series = pd.Series(my_dict)

my_dict_series

a    1
b    2
c    3
dtype: int64

In [14]:
# Drop the c or 3 from my_dict_series

my_dict_series.drop('c')

a    1
b    2
dtype: int64

In [15]:
# Change the data type from int64 to float64

my_dict_series.astype('float64')

a    1.0
b    2.0
c    3.0
dtype: float64

In [17]:
# Check the pandas data type

pd.Series(my_dict_series).dtypes

dtype('int64')

# Pandas Notes (Series)



### Pandas Series:
*Convert to a panda series*

```pd.series()```

Pandas series are 1 dimensional (plus a labeled row)

1 dimensional

        
        [a][b][c][d]

2 dimensional

        [a][b][c][d]
        [a][b][c][d]
        [a][b][c][d]
        [a][b][c][d]
        
In the Pandas library, a Series is a one-dimensional labeled array capable of holding data of various types (e.g., integers, floats, strings, etc.). It is similar to a NumPy array but provides additional functionalities like labeled indexing. Each element in a Series has an associated index, which allows for easy data manipulation and analysis. Series are commonly used to represent a single column of data within a DataFrame, which is a two-dimensional data structure in Pandas.

### Attributes:
Pandas Series come with various attributes that provide useful information about the data. Some common attributes include index, which returns the index labels of the Series, values, which returns the underlying data as a NumPy array, and dtype, which returns the data type of the elements in the Series.

                .method() has ()
                .attribute does not
                
- attributes tell you the characteristics (aka attributes) of a structure

### Binning values:
Binning is a data preprocessing technique used to categorize continuous numerical data into discrete intervals or bins. It is useful when dealing with large datasets and trying to summarize or analyze the data more effectively. Pandas provides a function called cut() that allows you to perform binning on a Series. By specifying the bin edges or the number of bins, you can convert continuous data into categorical data, which can then be analyzed using various aggregation functions.

### Summarizing a Series:
Pandas offers several built-in functions to summarize the data in a Series. Some common summarization functions include mean(), sum(), min(), max(), count(), describe(), etc. These functions allow you to calculate statistical measures such as the mean, sum, minimum, maximum, and count of the data in the Series. The describe() function provides a comprehensive summary of the data, including count, mean, standard deviation, quartiles, and more.

### Vectorized operation using a user-defined function:
Pandas supports vectorized operations, which allow you to apply functions to an entire Series without using explicit loops. This significantly improves performance and simplifies code. You can create a user-defined function and then apply it to a Series using methods like apply(), map(), or even through arithmetic operations. When a function is applied to a Series, it is automatically broadcasted element-wise, and the result is a new Series with the transformed values.

*Different ways to call a column*

        sleep_df['column_name']
        sleed_df.column_name

*Change data type in pandas series*

        .astype(int, floatm bool, object)

*Comparison operations (& = and) (| = or) (| is referred to as pipe symbol)

        (x > 1) & (x < 35)

        (X > 1) | (x < 35)

*Series methods examples and how to 
```
        .head()         - default first 5
        .tail()         - default last 5
        .sample()       - default 1 sample
        .value_counts() - count of every item in a structure
        .sort_values()  - 
        .describe       - gives count, mean, std, min, 25%, 50%, 75%, max
```


