# Data Analysis With PANDAS
### Pandas- Part 1

`Pandas` is a powerful Data analysis and Manipulation library for Python. It is basically used for manipulation of tabular data(similar to the data stored in spreadsheet). Additionally Pandas supports wide range of functionality to read huge data from varipus file formats like CSV, Excel sheets, JSON, HTML, SQL and so on. 

Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labelled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has
the broader goal of becoming the most powerful and flexible open source data.

### Main Features Of Pandas

With reference to Python Documentation, Here are a few of the features that Pandas does well:

    • Easy handling of missing data in floating point and non-floating point data types.
    • Pandas provides size mutability feature by which we can insert and delete columns from DataFrame and higher dimensional objects. 
    • Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations.
    • Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data.
    • Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects.
    • Intelligent label-based slicing, fancy indexing, and sub-setting of large data sets.
    • Intuitive merging and joining data sets.
    • Flexible reshaping and pivoting of data sets.
    • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging.

Before diving deep into Pandas, it has to be installed using pip.

### Installation
```py
pip install pandas
```

### Create DataFrame

A DataFrame is a table-like representation of the data. The data is organized into 2-dimensional table consisting of horizontal rows and vertical columns much like a spreadsheet.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.DataFrame({
    "Anime":
    [
        "One Piece","Naruto","Bleach","Gintama"
    ],
    "Episodes":
    [
        1014,720,366,366
    ]
})

In [3]:
df

Unnamed: 0,Anime,Episodes
0,One Piece,1014
1,Naruto,720
2,Bleach,366
3,Gintama,366


In [4]:
print(type(df))

<class 'pandas.core.frame.DataFrame'>


### Additing New Column In Pandas DataFrame

Say we need to add new column with the Main Character name for specific name. How to you do that? Its rather very simple:
```py
Syntax:
df['new_column_name'] = data
```

In [5]:
df['Main Character'] = ['Monkey D Luffy','Naruto','Ichigo','Gintoki']

In [6]:
df

Unnamed: 0,Anime,Episodes,Main Character
0,One Piece,1014,Monkey D Luffy
1,Naruto,720,Naruto
2,Bleach,366,Ichigo
3,Gintama,366,Gintoki


In [7]:
df['check'] = 1

In [8]:
df

Unnamed: 0,Anime,Episodes,Main Character,check
0,One Piece,1014,Monkey D Luffy,1
1,Naruto,720,Naruto,1
2,Bleach,366,Ichigo,1
3,Gintama,366,Gintoki,1


### Dot Notation and Square Notation To Represent Columns

In [9]:
df['Anime'] #when you need to check just one column instead of entire columns

0    One Piece
1       Naruto
2       Bleach
3      Gintama
Name: Anime, dtype: object

In [10]:
#method 2

In [11]:
df.Anime

0    One Piece
1       Naruto
2       Bleach
3      Gintama
Name: Anime, dtype: object

In [12]:
#multiple column selection

In [13]:
df[["Main Character","Anime"]]

Unnamed: 0,Main Character,Anime
0,Monkey D Luffy,One Piece
1,Naruto,Naruto
2,Ichigo,Bleach
3,Gintoki,Gintama


We usually use dot notation to represent single column, and square notation for multiple column selection

### Modifying Index to DataFrame

In [14]:
df = pd.DataFrame({
   "Anime":
    [
        "One Piece","Naruto","Bleach","Gintama"
    ],
    "Episodes":
    [
        994,720,366,366
    ]
}, index=['a','b','c','d'])  #modifying index

In [15]:
df

Unnamed: 0,Anime,Episodes
a,One Piece,994
b,Naruto,720
c,Bleach,366
d,Gintama,366


In [16]:
#set column as index using set_index method

In [17]:
df.set_index("Episodes")

Unnamed: 0_level_0,Anime
Episodes,Unnamed: 1_level_1
994,One Piece
720,Naruto
366,Bleach
366,Gintama


### Create A Pandas Series

Unlike DataFrame, A Series is one dimensional representation of the data. You can either pass a one dimensional ndarray or a list to represent a Series.   

In [18]:
df = pd.Series(["Good","Bad","Neutral"], index=['a','b','c'])
df2 = pd.Series(np.random.rand(10),index=list(range(1,11)))

In [19]:
print(df,'\n')
print(df2)

a       Good
b        Bad
c    Neutral
dtype: object 

1     0.654721
2     0.577851
3     0.761127
4     0.799087
5     0.314340
6     0.483692
7     0.802463
8     0.461881
9     0.302768
10    0.650401
dtype: float64


Notice the dtype i.e., data type varies for the above two Series. In Pandas every non-numeric data type is an object. In example1 Series, we pass list of String elements whose dtype is object. In example2 Series, since it is numeric with fractional part its dtype is float64. 64 denotes 64 bit. 

### Reading CSV File

Up until now we have used basic example data to create a DataFrame, in real world most of the data is in form .csv files. CSV stands for comma separated values. This can be viewed through excel sheet. In Machine Learning, Data Analysis and Data Visualization you will deal with dataset in large numbers. This dataset will be stored in .csv files. You need the help of Python Pandas to read the .csv file. There is a import csv module as well that deals with writing and reading csv files. But here we shall only focus on reading csv file through pandas.

```py
Syntax:
dataset = pd.read_csv(csv_file_path.csv)
```

You might be wandering where will this data come from? You can find large number of dataset on [Kaggle](http://kaggle.com/), [DPHI](https://dphi.tech/). You can even create your own csv file, either by doing Data Entry using Excel or By Web Scraping.

In [20]:
data = pd.read_csv("naruto_analysis.csv")
#well I am a One Piece Fan, never mind what dataset I chose, Just focus to understand the code

Note: The given dataset was obtained using Web Scraping [ListFist](https://listfist.com/list-of-one-piece-arcs) website. And the tutorial is available on [Anime Vyuh](https://animevyuh.org/the-big-three-anime/). The credits to original owner of the data is mentioned in the article

### To View the Data of Dataset

#### 1. head()

head() by default displays starting 5 rows of the dataset. 
```py
Syntax:
df.head() #default:5 rows
df.head(n) #n number of rows
```

In [21]:
data.head()

Unnamed: 0,Arc names,Total Episodes
0,Prologue — Land of Waves Arc,19
1,Chūnin Exams Arc,48
2,Konoha Crush Arc,13
3,Search for Tsunade Arc,20
4,Land of Tea Escort Mission Arc,6


As mentioned earlier that by default head display the starting 5 rows of the dataset. But what if you need to display first 10 data from the dataset. Yes it is possible using head method by providing the integer value as the parameter:

In [22]:
data.head(10) #here n=10

Unnamed: 0,Arc names,Total Episodes
0,Prologue — Land of Waves Arc,19
1,Chūnin Exams Arc,48
2,Konoha Crush Arc,13
3,Search for Tsunade Arc,20
4,Land of Tea Escort Mission Arc,6
5,Sasuke Recovery Mission Arc,29
6,Filler Arcs Arc,85
7,Kakashi Gaiden Arc,0
8,Special,17
9,Kazekage Rescue Mission Arc,32


#### 2. tail()

Like head(), even tail() will return ordered row data but from the bottom.

```py
Syntax:
df.tail() #default:5 rows
df.tail(n)  #n number of rows
```

In [23]:
data.tail()

Unnamed: 0,Arc names,Total Episodes
32,Childhood Arc,4
33,Sasuke Shinden: Book of Sunrise Arc,5
34,Shikamaru Hiden: A Cloud Drifting in Silent Da...,5
35,Konoha Hiden: The Perfect Day for a Wedding Arc,7
36,The Seventh Hokage and the Scarlet Spring Arc,0


If the integer argument is not passed inside the tail method then it indicates the default state and thus only 5 ordered rows are displayed from the bottom.

In order to print last 8 row data we can pass 8 as the argument:

In [24]:
data.tail(8)

Unnamed: 0,Arc names,Total Episodes
29,Jiraiya Shinobi Handbook: The Tale of Naruto t...,19
30,Kaguya Ōtsutsuki Strikes Arc,23
31,Itachi Shinden Book: Light and Darkness Arc,5
32,Childhood Arc,4
33,Sasuke Shinden: Book of Sunrise Arc,5
34,Shikamaru Hiden: A Cloud Drifting in Silent Da...,5
35,Konoha Hiden: The Perfect Day for a Wedding Arc,7
36,The Seventh Hokage and the Scarlet Spring Arc,0


Remember head and tail returns the ordered dataset. 

The third way to view the dataset is by using sample method.

#### 3. sample()

sample method is used to display the rows in unordered 
```py
Syntax:
df.sample() #by default only one row
df.sample(n) #n number of unordered rows
```

In [25]:
data.sample() #just one randomly selected row

Unnamed: 0,Arc names,Total Episodes
16,Fated Battle Between Brothers Arc,10


In [26]:
data.sample(5) #random 5 rows

Unnamed: 0,Arc names,Total Episodes
19,Past Arc: The Locus of Konoha Arc,21
31,Itachi Shinden Book: Light and Darkness Arc,5
8,Special,17
26,Kakashi's Anbu Arc: The Shinobi That Lives in ...,13
16,Fated Battle Between Brothers Arc,10


### Info and Describe

Info displays a report on the data index range, data column, dtype and memory usage. The data column consists of Column Name, Non-Null count and dtype. 


In [27]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Arc names       37 non-null     object
 1   Total Episodes  37 non-null     int64 
dtypes: int64(1), object(1)
memory usage: 720.0+ bytes


The describe() method returns description of the numeric data in the DataFrame. describe() gives an overview on data count, mean, median, quartile range, min and max values in the given data. 

In [28]:
data.describe()

Unnamed: 0,Total Episodes
count,37.0
mean,19.459459
std,16.485001
min,0.0
25%,7.0
50%,18.0
75%,22.0
max,85.0


Lets take a bigger dataset and understand `info()` and `describe()`

In [29]:
dataset = pd.read_csv("data.csv")

In [30]:
dataset.head(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
5,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...


In [31]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


Using info you get idea that we have 12 columns and their names. It also displays the datatype of each column. 

In [32]:
dataset.describe() #works only on numeric data

Unnamed: 0,release_year
count,8807.0
mean,2014.180198
std,8.819312
min,1925.0
25%,2013.0
50%,2017.0
75%,2019.0
max,2021.0


### Shape of Dataset

In [33]:
dataset.shape

(8807, 12)

### Checking for Null Data

Pandas also enables us to see the total sum of empty values in each columns.

In [34]:
dataset.isnull()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,False,False,False,False,True,False,False,False,False,False,False,False
1,False,False,False,True,False,False,False,False,False,False,False,False
2,False,False,False,False,False,True,False,False,False,False,False,False
3,False,False,False,True,True,True,False,False,False,False,False,False
4,False,False,False,True,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...
8802,False,False,False,False,False,False,False,False,False,False,False,False
8803,False,False,False,True,True,True,False,False,False,False,False,False
8804,False,False,False,False,False,False,False,False,False,False,False,False
8805,False,False,False,False,False,False,False,False,False,False,False,False


In [35]:
dataset.isnull().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

The above command returns the total sum from each column and there is one more command that returns the total sum of emtpy values from entire dataset.

In [36]:
dataset.isna().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

Note: We shall look how to deal with Empty Values in Part-2

### Filtering Dataset Or Masking

In [37]:
data = {
    "Anime Names":
            [
            "One Piece","Naruto","Gintama","Bleach","Attack On Titan"  ,  "Code Geass","Death Note","Haikyuu","Erased","Hunter X Hunter","Hyouka","No Game No Life","Kakegurui","Jujutsu Kaisen","Anohana","Monster","Wotakoi","Oregariu"
            ],
    "Total Episodes":
            [
            1014,720,366,366,77,50,37,85,12,144,22,12,24,24,11,70,12,48
            ],
    "Genre":
            [
            "Adventure","Shounen","Comedy","Action","Dark Fantasy","Mystery","Crime","Sports","Mystery","Shounen","Detective","Mystery","Drama","Action","Drama","Crime","Romance","Romance"
            ]
}
df = pd.DataFrame(data)

In [38]:
df

Unnamed: 0,Anime Names,Total Episodes,Genre
0,One Piece,1014,Adventure
1,Naruto,720,Shounen
2,Gintama,366,Comedy
3,Bleach,366,Action
4,Attack On Titan,77,Dark Fantasy
5,Code Geass,50,Mystery
6,Death Note,37,Crime
7,Haikyuu,85,Sports
8,Erased,12,Mystery
9,Hunter X Hunter,144,Shounen


There are three different through which you could filter the dataset:

## Single Filtering

### First: Using Simple Condition


In [39]:
above_100_episodes = df[df['Total Episodes']>100]

In [40]:
above_100_episodes

Unnamed: 0,Anime Names,Total Episodes,Genre
0,One Piece,1014,Adventure
1,Naruto,720,Shounen
2,Gintama,366,Comedy
3,Bleach,366,Action
9,Hunter X Hunter,144,Shounen


### Second: Using Anonymous Function[map and lambda]

In [41]:
above_100 = df[df['Total Episodes'].map(lambda x:x>100)]

In [42]:
above_100

Unnamed: 0,Anime Names,Total Episodes,Genre
0,One Piece,1014,Adventure
1,Naruto,720,Shounen
2,Gintama,366,Comedy
3,Bleach,366,Action
9,Hunter X Hunter,144,Shounen


### Third: Using Series


In [43]:
above_100_check = list()
for i in df['Total Episodes']:
    if i>100:
        above_100_check.append(True)
    else:
        above_100_check.append(False)

ser = pd.Series(above_100_check)
above_100_series = df[ser]


In [44]:
above_100_series

Unnamed: 0,Anime Names,Total Episodes,Genre
0,One Piece,1014,Adventure
1,Naruto,720,Shounen
2,Gintama,366,Comedy
3,Bleach,366,Action
9,Hunter X Hunter,144,Shounen


## Muliple Filtering

### 1.Using Simple Logical Conditions

In [45]:
above_100_shounen = df[(df['Total Episodes']>100) & (df['Genre']=='Shounen')]
above_100_shounen

Unnamed: 0,Anime Names,Total Episodes,Genre
1,Naruto,720,Shounen
9,Hunter X Hunter,144,Shounen


### 2. Using Map and Lambda

In [46]:
less_than_50_drama = df[df['Total Episodes'].map(lambda x:x<=50) & df['Genre'].map(lambda anime:anime=="Drama")]

In [47]:
less_than_50_drama

Unnamed: 0,Anime Names,Total Episodes,Genre
12,Kakegurui,24,Drama
14,Anohana,11,Drama


### Maximum And Minimum Data In the Dataset

In [48]:
maximum_episodes = df[df['Total Episodes']==df['Total Episodes'].max()]

In [49]:
maximum_episodes

Unnamed: 0,Anime Names,Total Episodes,Genre
0,One Piece,1014,Adventure


In [50]:
minimum_episodes = df[df['Total Episodes']==df['Total Episodes'].min()]

In [51]:
minimum_episodes

Unnamed: 0,Anime Names,Total Episodes,Genre
14,Anohana,11,Drama


In [52]:
dataset.head(3)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...


In [53]:
latest_release_year = dataset[dataset['release_year'] == dataset['release_year'].max()]

In [54]:
print(dataset['release_year'].max())
print('\n\n')
latest_release_year

2021





Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
5,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...
...,...,...,...,...,...,...,...,...,...,...,...,...
1468,s1469,Movie,What Happened to Mr. Cha?,Kim Dong-kyu,"Cha In-pyo, Cho Dal-hwan, Song Jae-ryong",South Korea,"January 1, 2021",2021,TV-MA,102 min,"Comedies, International Movies","With the peak of his career long behind him, a..."
1551,s1552,TV Show,Hilda,,"Bella Ramsey, Ameerah Falzon-Ojo, Oliver Nelso...","United Kingdom, Canada, United States","December 14, 2020",2021,TV-Y7,2 Seasons,Kids' TV,"Fearless, free-spirited Hilda finds new friend..."
1696,s1697,TV Show,Polly Pocket,,"Emily Tennant, Shannon Chan-Kent, Kazumi Evans...","Canada, United States, Ireland","November 15, 2020",2021,TV-Y,2 Seasons,Kids' TV,After uncovering a magical locket that allows ...
2920,s2921,TV Show,Love Is Blind,,"Nick Lachey, Vanessa Lachey",United States,"February 13, 2020",2021,TV-MA,1 Season,"Reality TV, Romantic TV Shows",Nick and Vanessa Lachey host this social exper...


In [55]:
old_release_year = dataset[dataset['release_year'] == dataset['release_year'].min()]

In [56]:
print(dataset['release_year'].min())
print('\n\n')
old_release_year

1925





Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
4250,s4251,TV Show,Pioneers: First Women Filmmakers*,,,,"December 30, 2018",1925,TV-14,1 Season,TV Shows,This collection restores films from women who ...


### Correlation

Correlation is an important statistic that tells us how two sets of values are related to each other. A positive correlation indicates that the values tend to increase with one another and a negative correlation indicates that values in one set tend to decrease with an increase in the other set. And the correlation is in range -1 to 1. 

Let’s take an example to understand Correlation. Say you and your friend are sitting next to each other in an exam hall. Definitely, both of you will cheat and write down similar answer. In simple words increase in your marks will affect increase in your friends marks. Also decrease in your marks will affect decrease in your friends marks. So we can conclude the Correlation is strong i.e., relationship between two entities was similar. If correlation is more than >=0.6 it is said to be Positive Correlation, and if correlation is less than <= -0.6 it is said to be Negative Correlation. This threshold value considered to state a strong Correlation. 

Now lets take another scenario. Say you and some stranger kid are sitting next to each other in an exam hall. So you rarely will cheat with some stranger. Now the marks you get and that stranger get are not related. Thus Correlation will be weak, when there is no relation between the values. And the Correlation value will be near to 0. 
```py
Syntax:
df.corr()
```

Lets take a dataset with lots of numeric data, we shall use both `describe()` and `corr()` method on it

In [57]:
numeric = pd.read_csv('random.csv')

In [58]:
numeric.head(2)

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,


In [59]:
numeric.describe()

Unnamed: 0,id,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,0.0
mean,30371830.0,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,...,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946,
std,125020600.0,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,...,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061,
min,8670.0,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,...,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504,
25%,869218.0,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,...,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146,
50%,906024.0,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,...,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004,
75%,8813129.0,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,...,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208,
max,911320500.0,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,...,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075,


In [60]:
numeric.corr()

Unnamed: 0,id,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
id,1.0,0.074626,0.09977,0.073159,0.096893,-0.012968,9.6e-05,0.05008,0.044158,-0.022114,...,0.06472,0.079986,0.107187,0.010338,-0.002968,0.023203,0.035174,-0.044224,-0.029866,
radius_mean,0.074626,1.0,0.323782,0.997855,0.987357,0.170581,0.506124,0.676764,0.822529,0.147741,...,0.297008,0.965137,0.941082,0.119616,0.413463,0.526911,0.744214,0.163953,0.007066,
texture_mean,0.09977,0.323782,1.0,0.329533,0.321086,-0.023389,0.236702,0.302418,0.293464,0.071401,...,0.912045,0.35804,0.343546,0.077503,0.27783,0.301025,0.295316,0.105008,0.119205,
perimeter_mean,0.073159,0.997855,0.329533,1.0,0.986507,0.207278,0.556936,0.716136,0.850977,0.183027,...,0.303038,0.970387,0.94155,0.150549,0.455774,0.563879,0.771241,0.189115,0.051019,
area_mean,0.096893,0.987357,0.321086,0.986507,1.0,0.177028,0.498502,0.685983,0.823269,0.151293,...,0.287489,0.95912,0.959213,0.123523,0.39041,0.512606,0.722017,0.14357,0.003738,
smoothness_mean,-0.012968,0.170581,-0.023389,0.207278,0.177028,1.0,0.659123,0.521984,0.553695,0.557775,...,0.036072,0.238853,0.206718,0.805324,0.472468,0.434926,0.503053,0.394309,0.499316,
compactness_mean,9.6e-05,0.506124,0.236702,0.556936,0.498502,0.659123,1.0,0.883121,0.831135,0.602641,...,0.248133,0.59021,0.509604,0.565541,0.865809,0.816275,0.815573,0.510223,0.687382,
concavity_mean,0.05008,0.676764,0.302418,0.716136,0.685983,0.521984,0.883121,1.0,0.921391,0.500667,...,0.299879,0.729565,0.675987,0.448822,0.754968,0.884103,0.861323,0.409464,0.51493,
concave points_mean,0.044158,0.822529,0.293464,0.850977,0.823269,0.553695,0.831135,0.921391,1.0,0.462497,...,0.292752,0.855923,0.80963,0.452753,0.667454,0.752399,0.910155,0.375744,0.368661,
symmetry_mean,-0.022114,0.147741,0.071401,0.183027,0.151293,0.557775,0.602641,0.500667,0.462497,1.0,...,0.090651,0.219169,0.177193,0.426675,0.4732,0.433721,0.430297,0.699826,0.438413,


The ideal value for correlation between two features is 0.6/-0.6