# Objects in Pandas

From: K.A.

**Pandas** is a high-level Python library for data analysis.

In addition to a user-friendly interface for storing data, the Pandas library implements many operations to manipulate data.


1. The *Series* object of the **Pandas** library is a one-dimensional array of indexed data. It can be created from a list or an array as follows:

In [1]:
import pandas as pd
import numpy as np
sr_1 = pd.Series([0.1, 0.2, 0.13, 0.4, 0.15])
sr_1

0    0.10
1    0.20
2    0.13
3    0.40
4    0.15
dtype: float64

In [2]:
# Values and indices
print('Values: ',sr_1.values)
print('Indices:',sr_1.index)

Values:  [0.1  0.2  0.13 0.4  0.15]
Indices: RangeIndex(start=0, stop=5, step=1)


Data can be accessed by its corresponding index using square brackets

In [3]:
sr_1[2]

0.13

If desired, we can use string values as an index:

In [4]:
sr_2=pd.Series([2.1, 0.2, 0.13, 0.4, 0.15],
               index=['a','b','f', 'd', 'l'])
sr_2

a    2.10
b    0.20
f    0.13
d    0.40
l    0.15
dtype: float64

### The Series object as a one-dimensional array

In [5]:
#slice through an integer index

sr_1[0:3]

0    0.10
1    0.20
2    0.13
dtype: float64

In [6]:
# masking

sr_1[(sr_1>0.1) & (sr_1<0.5)]

1    0.20
2    0.13
3    0.40
4    0.15
dtype: float64

In [8]:
# slice through an explicit index

sr_2['a':'f']

a    2.10
b    0.20
f    0.13
dtype: float64

In [9]:
sr_2[['a','d']]

a    2.1
d    0.4
dtype: float64

The Series object in Pandas can be viewed as a specialized kind of Python dictionary:

In [10]:
dict = {'Anna': 123,
        'Marie': 345,
        'Linda': 498}
sr_dict = pd.Series(dict)
sr_dict

Anna     123
Marie    345
Linda    498
dtype: int64

In [11]:
# Methods similar for dictionaries

'Linda' in sr_dict

True

In [12]:
'Nick' in sr_dict

False

In [15]:
# keys

sr_dict.keys()

Index(['Anna', 'Marie', 'Linda'], dtype='object')

In [16]:
# items

list(sr_dict.items())

[('Anna', 123), ('Marie', 345), ('Linda', 498)]

In [18]:
 #You can extend the Series object by assigning a value for the new index value

 sr_dict['John']='442'
 sr_dict

Anna     123
Marie    345
Linda    498
John     442
dtype: object

Attribute-indexers that allow you to explicitly apply certain indexing schemes.\
loc - slice with an explicit index\
iloc - slice with implicit index

In [19]:
sr_4 = pd.Series(['Montreal','Sherbrooke','Mes Amies'],
                 index=[1, 2, 3])
sr_4.loc[1:3]

1      Montreal
2    Sherbrooke
3     Mes Amies
dtype: object

In [20]:
sr_4.iloc[1:3]

2    Sherbrooke
3     Mes Amies
dtype: object

2. *DataFrame* – is an indexed multidimensional array of values; accordingly, each column of a *DataFrame* is a *Series* structure.

Just as a two-dimensional array can be viewed as an ordered sequence of aligned columns, a *DataFrame* object can be viewed as an ordered sequence of aligned *Series* objects. By "aligned" we mean that they use the same index.

There are several ways to create *DataFrame* objects in **Pandas**:

In [22]:
#Creating from a single object Series
countries = pd.Series([4.5, 5.98, 2.485],
                      index=['Quebec','Ontario','Alberta'])
df = pd.DataFrame(countries, 
                  columns=['population'])
df

Unnamed: 0,population
Quebec,4.5
Ontario,5.98
Alberta,2.485


In [23]:
#Create from a list of dictionaries
dict = ([{'Sherbrooke':5, 'Montreal':4},
         {'Sherbrooke':4,'Montreal':4},
         {'Sherbrooke':3,'Montreal':5}])
df = pd.DataFrame(dict, 
                  index=['1 qt','2 qt','3 qt'])
df

Unnamed: 0,Sherbrooke,Montreal
1 qt,5,4
2 qt,4,4
3 qt,3,5


In [24]:
# Create from the dictionary of Series objects

population_dict = {'Quebec':8.5, 'Ontario':14.2,'Alberta':4.3}
population = pd.Series(population_dict)
square = pd.Series({'Quebec':1542056,'Ontario':1076395,'Alberta':661848})
df = pd.DataFrame({'population':population,
                   'square':square})
df

Unnamed: 0,population,square
Quebec,8.5,1542056
Ontario,14.2,1076395
Alberta,4.3,661848


In [26]:
# DataFrame object from the two-dimensional array NumPy

pd.DataFrame(np.random.rand(3, 4),columns=[1,2,3,4],index=["a","b","c"])

Unnamed: 0,1,2,3,4
a,0.500255,0.923204,0.457819,0.787789
b,0.04453,0.445209,0.250512,0.794841
c,0.476751,0.482102,0.347254,0.689024


In [27]:
# A DataFrame object created directly from a structured array from NumPy

g=np.zeros(4,dtype=[("a","i8"),("b","f8")])
g
pd.DataFrame(g)

Unnamed: 0,a,b
0,0,0.0
1,0,0.0
2,0,0.0
3,0,0.0




---


Sampling data from a *DataFrame* object

In [28]:
#Search by column (DataFrame object as a dictionary)

df['square']

Quebec     1542056
Ontario    1076395
Alberta     661848
Name: square, dtype: int64

In [30]:

df.values

array([[8.500000e+00, 1.542056e+06],
       [1.420000e+01, 1.076395e+06],
       [4.300000e+00, 6.618480e+05]])

In [31]:
#Transpose

df.T

Unnamed: 0,Quebec,Ontario,Alberta
population,8.5,14.2,4.3
square,1542056.0,1076395.0,661848.0


In [32]:
print(df.values[0])
print(df['population'])

[8.500000e+00 1.542056e+06]
Quebec      8.5
Ontario    14.2
Alberta     4.3
Name: population, dtype: float64


loc - slice with an explicit index\
iloc - slice with implicit index

In [33]:
df.loc[:'Quebec', :'square']

Unnamed: 0,population,square
Quebec,8.5,1542056


In [34]:
df.iloc[:2, :1]

Unnamed: 0,population
Quebec,8.5
Ontario,14.2


In [41]:
df.loc[df.population > 6.0, ['population', 'square']]

Unnamed: 0,population,square
Quebec,8.5,1542056
Ontario,14.2,1076395


---

3. The *Index* object can be viewed either as an immutable array or as an ordered set. Some useful features of operations on *Index* objects follow from these ways of its representation. As a simple example, let's create an *Index* from a list of integers:

In [42]:
ind = pd.Index([1, 2, 3, 4, 5])
ind

Int64Index([1, 2, 3, 4, 5], dtype='int64')

The *Index* object behaves a lot like an array:

In [43]:
print(ind[3])
print(ind[::2])

4
Int64Index([1, 3, 5], dtype='int64')


Attributes

In [44]:
print('Number of elements:', ind.size,'\n')
print('Shape:', ind.shape,'\n')
print('Dimention', ind.ndim, '\n')
print('Data type:', ind.dtype)

Number of elements: 5 

Shape: (5,) 

Dimention 1 

Data type: int64


The *Index* object also acts like an ordered set:

In [49]:
A = pd.Index([2, 4, 6, 8, 10, 12, 14])
B = pd.Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

print('Intersection:',A.intersection(B),'\n')
print('Union:', A.union(B),'\n')
print('Symmetric difference:', A.symmetric_difference(B),'\n')

Intersection: Int64Index([2, 4, 6, 8, 10], dtype='int64') 

Union: Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14], dtype='int64') 

Symmetric difference: Int64Index([1, 3, 5, 7, 9, 12, 14], dtype='int64') 



## TASK
There is a dataframe of random numbers with negative values. You need to remove all negative values and form a square matrix of the remaining positive values


In [50]:
# TASK 

df = pd.DataFrame(np.random.randint(-20, 20, 100).reshape(10,10))
 
print(df)
print()
 
# leave only positive values
arr = df[df > 0].values.flatten()
arr_qualified = arr[~np.isnan(arr)]
 
# determine the dimensionality of the square matrix
n = int(np.floor(arr_qualified.shape[0]**.5))
 
# create a matrix of positive values
top_indexes = np.argsort(arr_qualified)[::-1]
output = np.take(arr_qualified, sorted(top_indexes[:n**2])).reshape(n, n)
 
# convert to a DataFrame object
df2 = pd.DataFrame(data=output.astype('i'), columns=np.arange(output.shape[1]))
 
print(df2)

    0   1   2   3   4   5   6   7   8   9
0 -17   8  -9 -19 -18  12 -19  -3 -15 -17
1 -18   3  14 -16   0 -11   0  16   2   1
2  12  -1  17 -14  -4  15 -18  17  10  -2
3  -2  19 -18   4  13 -18 -15  12   1 -17
4   8   1  15  15  15  10  12 -20  -8 -16
5   2   6 -18 -15 -16 -12 -12   4  -1 -11
6  19   3 -17  13   6   3  -8  -2  15  -8
7 -18 -14 -13 -19  -9 -12  -4   6   9  -3
8 -16  -2 -12  -5   6  10   4   0  18 -15
9  -1  17 -11  15  10  13 -15 -19   8  -3

    0   1   2   3   4   5
0   8  12  14  16  12  17
1  15  17  10  19   4  13
2  12   8  15  15  15  10
3  12   6   4  19  13   6
4  15   6   9   6  10   4
5  18  17  15  10  13   8
