
pandas introduces two new data structures to Python - Series and DataFrame, both of which are built on top of NumPy (this means it's fast).

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('max_columns', 50)
%matplotlib inline

In [10]:
#create a Series with an arbitrary list
lst = ["1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean", 0.6615157480314962, "LogisticRegression_cv_real_accuracy_var", 0.006677417290381456, "balanced_accuracy_mean", 0.6407544723878662, "balanced_accuracy_var", 0.007419682842323463, "precision_mean", 0.6492852723697509, "precision_var", 0.0043547275072768855, "recall_mean", 0.8542857142857143, "var", 0.007787755102040816, "f1_mean", 0.7350087620812211, "f1_var", 0.0037145063261670623, "roc_auc_mean", 0.7881941923774954, "var", 0.012534935701240119, "fake_accuracy_mean", 0.6615157480314962, "fake_accuracy_var", 0.006677417290381456]
series = pd.Series(lst)
print(series)

0     1-0-0 char-word-func_LogisticRegression_cv_rea...
1                                              0.661516
2               LogisticRegression_cv_real_accuracy_var
3                                            0.00667742
4                                balanced_accuracy_mean
5                                              0.640754
6                                 balanced_accuracy_var
7                                            0.00741968
8                                        precision_mean
9                                              0.649285
10                                        precision_var
11                                           0.00435473
12                                          recall_mean
13                                             0.854286
14                                                  var
15                                           0.00778776
16                                              f1_mean
17                                             0

28

Alternatively, you can specify an index to use when creating the Series.

In [17]:
#create a Series with an arbitrary list
series = pd.Series(["1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean", 0.661515748,
                    "LogisticRegression_cv_real_accuracy_var", 0.0066], 
                   index = ['A','B','C','D'])
print(series)

A    1-0-0 char-word-func_LogisticRegression_cv_rea...
B                                             0.661516
C              LogisticRegression_cv_real_accuracy_var
D                                               0.0066
dtype: object


The Series constructor can convert a dictonary as well, using the keys of the dictionary as its index

In [36]:
#In the dictionary form
clas = {"1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean": 0.6615157480314962, 
        "LogisticRegression_cv_real_accuracy_var": 0.006677417290381456, 
        "balanced_accuracy_mean": 0.6407544723878662, "balanced_accuracy_var": 0.007419682842323463,
        "precision_mean": 0.6492852723697509, "precision_var": 0.0043547275072768855,
        "recall_mean": 0.8542857142857143, "var": 0.007787755102040816, 
        "f1_mean": 0.7350087620812211, "f1_var": 0.0037145063261670623, 
        "roc_auc_mean": 0.7881941923774954, "var": 0.012534935701240119, 
        "fake_accuracy_mean": 0.6615157480314962, "fake_accuracy_var": None}
S = pd.Series(clas)
print(S)

1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean    0.661516
LogisticRegression_cv_real_accuracy_var                          0.006677
balanced_accuracy_mean                                           0.640754
balanced_accuracy_var                                            0.007420
precision_mean                                                   0.649285
precision_var                                                    0.004355
recall_mean                                                      0.854286
var                                                              0.012535
f1_mean                                                          0.735009
f1_var                                                           0.003715
roc_auc_mean                                                     0.788194
fake_accuracy_mean                                               0.661516
fake_accuracy_var                                                     NaN
dtype: float64


In [37]:
S['f1_mean']

0.7350087620812211

In [38]:
S[['f1_mean', 'f1_var']]

f1_mean    0.735009
f1_var     0.003715
dtype: float64

Or you can use boolean indexing for selection

In [39]:
#Shows only those in the series ('S') who has vales less than 0.70
S[S<0.70]

1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean    0.661516
LogisticRegression_cv_real_accuracy_var                          0.006677
balanced_accuracy_mean                                           0.640754
balanced_accuracy_var                                            0.007420
precision_mean                                                   0.649285
precision_var                                                    0.004355
var                                                              0.012535
f1_var                                                           0.003715
fake_accuracy_mean                                               0.661516
dtype: float64

let's make it more clear - S < 0.70 returns a Series of True/False values, which we then pass to our Series S, returning the corresponding True items.

In [40]:
less_than_070 = S < 0.70
print(less_than_070)

print('\n')

print(S[less_than_070])

1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean     True
LogisticRegression_cv_real_accuracy_var                           True
balanced_accuracy_mean                                            True
balanced_accuracy_var                                             True
precision_mean                                                    True
precision_var                                                     True
recall_mean                                                      False
var                                                               True
f1_mean                                                          False
f1_var                                                            True
roc_auc_mean                                                     False
fake_accuracy_mean                                                True
fake_accuracy_var                                                False
dtype: bool


1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean  

You can also change the values in a Series on the fly

In [41]:
# changing based on the index
print('precision_mean:', S['precision_mean'])
S['precision_mean'] = 80
print('New value of precision_mean :', S['precision_mean'])

precision_mean: 0.6492852723697509
New value of precision_mean : 80.0


In [44]:
# changing values using boolean logic
print(S[S < 0.70])
print('\n')

S[S < 0.70] = 0.68

print (S[S < 0.70])

1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean    0.661516
LogisticRegression_cv_real_accuracy_var                          0.006677
balanced_accuracy_mean                                           0.640754
balanced_accuracy_var                                            0.007420
precision_var                                                    0.004355
var                                                              0.012535
f1_var                                                           0.003715
fake_accuracy_mean                                               0.661516
dtype: float64


1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean    0.68
LogisticRegression_cv_real_accuracy_var                          0.68
balanced_accuracy_mean                                           0.68
balanced_accuracy_var                                            0.68
precision_var                                                    0.68
var                                      

What if you aren't sure whether an item is in the Series? You can check using idiomatic Python.

In [45]:
print('Seattle' in S)
print('f1_var' in S)

False
True


Mathematical operations can be done using scalars and functions.

In [46]:
# divide city values by 3
S / 3

1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean     0.226667
LogisticRegression_cv_real_accuracy_var                           0.226667
balanced_accuracy_mean                                            0.226667
balanced_accuracy_var                                             0.226667
precision_mean                                                   26.666667
precision_var                                                     0.226667
recall_mean                                                       0.284762
var                                                               0.226667
f1_mean                                                           0.245003
f1_var                                                            0.226667
roc_auc_mean                                                      0.262731
fake_accuracy_mean                                                0.226667
fake_accuracy_var                                                      NaN
dtype: float64

In [47]:
# square city values
np.square(S)

1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean       0.462400
LogisticRegression_cv_real_accuracy_var                             0.462400
balanced_accuracy_mean                                              0.462400
balanced_accuracy_var                                               0.462400
precision_mean                                                   6400.000000
precision_var                                                       0.462400
recall_mean                                                         0.729804
var                                                                 0.462400
f1_mean                                                             0.540238
f1_var                                                              0.462400
roc_auc_mean                                                        0.621250
fake_accuracy_mean                                                  0.462400
fake_accuracy_var                                                        NaN

You can add two Series together, which returns a union of the two Series with the addition occurring on the shared index values. Values on either Series that did not have a shared index will produce a NULL/NaN (not a number).

In [49]:
print(S[['f1_mean', 'fake_accuracy_mean', 'roc_auc_mean ']])
print('\n')
print(S[['var', 'precision_var ']])
print('\n')
print(S[['f1_mean', 'fake_accuracy_mean', 'roc_auc_mean ']] + S[['var', 'precision_var ']])

f1_mean               0.735009
fake_accuracy_mean    0.680000
roc_auc_mean               NaN
dtype: float64


var               0.68
precision_var      NaN
dtype: float64


f1_mean              NaN
fake_accuracy_mean   NaN
precision_var        NaN
roc_auc_mean         NaN
var                  NaN
dtype: float64


Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self.loc[key]


Notice that because Austin, Chicago, and Portland were not found in both Series, they were returned with NULL/NaN values.

NULL checking can be performed with isnull and notnull.

In [50]:
# returns a boolean series indicating which values aren't NULL
S.notnull()

1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean     True
LogisticRegression_cv_real_accuracy_var                           True
balanced_accuracy_mean                                            True
balanced_accuracy_var                                             True
precision_mean                                                    True
precision_var                                                     True
recall_mean                                                       True
var                                                               True
f1_mean                                                           True
f1_var                                                            True
roc_auc_mean                                                      True
fake_accuracy_mean                                                True
fake_accuracy_var                                                False
dtype: bool

In [51]:
# use boolean logic to grab the NULL cities
print(S.isnull())
print('\n')
print(S[S.isnull()])

1-0-0 char-word-func_LogisticRegression_cv_real_accuracy_mean    False
LogisticRegression_cv_real_accuracy_var                          False
balanced_accuracy_mean                                           False
balanced_accuracy_var                                            False
precision_mean                                                   False
precision_var                                                    False
recall_mean                                                      False
var                                                              False
f1_mean                                                          False
f1_var                                                           False
roc_auc_mean                                                     False
fake_accuracy_mean                                               False
fake_accuracy_var                                                 True
dtype: bool


fake_accuracy_var   NaN
dtype: float64


http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/ 
    
is link mian Data Frame sy phly tek kia hai