<a href="https://colab.research.google.com/github/OptimalDecisions/sports-analytics-foundations/blob/main/pandas-basics/Pandas_Basics_2_1_Data_Structures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 ## Pandas Basics 2.1



# Data Structures

  <img src = "../img/sa_logo.png" width="100" align="left">

  Ram Narasimhan

  <br><br><br>

  << [GroupBy](Pandas_Intermediate_2_10_GroupBy.ipynb)|[Data Structures](Pandas_Basics_2_1_Data_Structures.ipynb) | [Reading Data from files](Pandas_Basics_2_2_Reading_Files.ipynb) >>




## Series

In Pandas, a Series is a one-dimensional labeled array capable of holding any data type. It can be thought of as a column in a spreadsheet, a single column of data with an associated label or index.


In [14]:
players = pd.Series(['Pele', 'Lionel Messi', 'Cristiano Ronaldo', 'Diego Maradona', 'Zinedine Zidane', 'Thierry Henry', 'Ronaldo Nazario'])

# Print the series
print(players)

0                 Pele
1         Lionel Messi
2    Cristiano Ronaldo
3       Diego Maradona
4      Zinedine Zidane
5        Thierry Henry
6      Ronaldo Nazario
dtype: object


### Values and index

**Values**

The values in a Pandas Series represent the actual data. These can be of any data type, including integers, floats, strings, or even more complex objects like other Pandas objects.

In [15]:
players.values

array(['Pele', 'Lionel Messi', 'Cristiano Ronaldo', 'Diego Maradona',
       'Zinedine Zidane', 'Thierry Henry', 'Ronaldo Nazario'],
      dtype=object)

**Index**

The index in a Pandas Series provides labels for each value. It's similar to `row labels` in a spreadsheet or the keys in a dictionary.

By default, if you don't specify an index, Pandas will create a default integer-based index starting from 0. However, you can set a custom index based on your needs.

In the example below, the default integer index is assigned to the player names

In [19]:
print(players.index)

RangeIndex(start=0, stop=7, step=1)


We can use the `index` to refer to a particular element in this series. We use square brackets to access by index

In [20]:
players[3]

'Diego Maradona'

## Data Frames

In [9]:
import pandas as pd

data = {
    'Player': ['Pele', 'Lionel Messi', 'Cristiano Ronaldo', 'Diego Maradona', 'Zinedine Zidane', 'Thierry Henry', 'Ronaldo Nazario'],
    'Country': ['Brazil', 'Argentina', 'Portugal', 'Argentina', 'France', 'France', 'Brazil'],
    'Lifetime Goals': [1283, 821, 890, 345, 126, 360, 352],  # These numbers are approximations and need verification
    'Retirement': [1977, None, None, 1997, 2006, 2014, 2011],  # Update with the actual retirement years
    'World Cup Wins': [3, 1, 0, 1, 1, 1, 2]  # Update with the actual number of World Cup wins
}


df = pd.DataFrame(data)

#print the result
print(df)


              Player    Country  Lifetime Goals  Retirement  World Cup Wins
0               Pele     Brazil            1283      1977.0               3
1       Lionel Messi  Argentina             821         NaN               1
2  Cristiano Ronaldo   Portugal             890         NaN               0
3     Diego Maradona  Argentina             345      1997.0               1
4    Zinedine Zidane     France             126      2006.0               1
5      Thierry Henry     France             360      2014.0               1
6    Ronaldo Nazario     Brazil             352      2011.0               2


### Values and index

**Values**
The values in the DataFrame represent the actual data. Each column in the DataFrame corresponds to a different attribute, and each row corresponds to a different observation or record.
For example, in the 'Player' column, the values are the names of soccer players ('Pele', 'Lionel Messi', etc.). In the 'Lifetime Goals' column, the values represent the number of lifetime goals for each player.

**Index**

The index in a DataFrame provides labels for each row. It's similar to row labels in a spreadsheet or the keys in a dictionary.
By default, if you don't specify an index, Pandas will create a default integer-based index starting from 0.

In [22]:
df.values # It is more common to use the values of a Series (column), not for the whole dataframe.

array([['Pele', 'Brazil', 1283, 1977.0, 3],
       ['Lionel Messi', 'Argentina', 821, nan, 1],
       ['Cristiano Ronaldo', 'Portugal', 890, nan, 0],
       ['Diego Maradona', 'Argentina', 345, 1997.0, 1],
       ['Zinedine Zidane', 'France', 126, 2006.0, 1],
       ['Thierry Henry', 'France', 360, 2014.0, 1],
       ['Ronaldo Nazario', 'Brazil', 352, 2011.0, 2]], dtype=object)

In [23]:
df.index

RangeIndex(start=0, stop=7, step=1)

### Label-based indexing

In Pandas, you can refer to a particular row in a DataFrame using the `loc` accessor.

Let's see to reference the row corresponding to `Diego Maradona`:

Using loc: You reference the row by its label (index). In our example, the indices are integers.


In [24]:
df.loc[3]

Player            Diego Maradona
Country                Argentina
Lifetime Goals               345
Retirement                1997.0
World Cup Wins                 1
Name: 3, dtype: object

Next, let's see how to read Data from Files


<< [2.0 Getting Started](Pandas_Intermediate_2_0_Getting_Started.ipynb)  |  [2.1 Data Structures](Pandas_Basics_2_1_Data_Structures.ipynb)         |    [2.2 Reading Data from files](Pandas_Basics_2_2_Reading_Files.ipynb) >>