# **Task to be performed :**

### **Task1. Explore Top-5 Data Analytics Libraries in Python**
**Numpy:**

Features:
* Powerful N-Dimensional Arrays
* Numerical Computing Tools
* Open Source
* Easy to Use

Applications:
* Data Science
* Machine Learning
* Scientific Research
* Financial Modeling

**Pandas:**

Features:
* Data cleaning
* Data filtering and selection
* Data aggregation
* Data visualization

Applications:
* Economics
* Statistics
* Recommendation systems
* NeuroScience

**Scipy**:

Features:
* Fundamental Algorithms
* Easy to Use
* Open Source
* Performant

Applications:
* Numerical interpolation
* Sparse linear system
* Linear algebra
* Signal processing

**Tensorflow**

Features:
* High-Level APIs
* Flexibility
* Scalability
* TensorFlow Lite

Applications:
* Image Classification
* Natural Language Processing (NLP)
* Speech Recognition
* Object Detection

**Pytorch**

Features:
* Dynamic Computational Graphs
* Neural Network Module
* Ecosystem and Community
* Dynamic Neural Networks

Applications:
* Deep Learning Research
* Computer Vision
* Healthcare and Biology
* Transfer Learning

### **Task2. Explore Top-5 Data Analytics Libraries in R**
**ggplot2**

Features:
* Grammar of Graphics-based plotting system.
* Allows the creation of complex and customizable data visualizations.
* Provides a high-level interface for producing aesthetically pleasing plots.

Applications:
* Exploratory data analysis through rich visualizations.
* Creation of publication-quality plots.
* Visualizing relationships and patterns in data.

**dplyr**

Features:
* Provides a set of functions for data manipulation, such as filtering, selecting, grouping, and summarizing data.
* Provides a set of functions for data manipulation, such as filtering, selecting, grouping, and summarizing data.
* Offers a consistent and intuitive syntax for data wrangling.
Facilitates the chaining of operations for efficient data transformations.

Applications:
* Data cleaning and preprocessing.
* Subsetting and filtering data.
* Aggregating and summarizing data.

**tidyr**

Features:
* Data Reshaping
* Separating and Uniting Variables
* Handling Missing Values

Applications:
* Data Cleaning and Preprocessing
* Data Tidying for Visualization
* Handling Survey Data

**stringr**

Features:
* Consistent Function Naming
* Pattern Matching
* Character Manipulation

Applications:
* Data Cleaning and Wrangling
* Text Mining and Analysis
* Regular Expression Operations

**lubridate**

Features:
* Mathematical Operations
* Parsing and Creating Date-Time Objects
* Easy Extraction of Components

Applications:
* Time Series Analysis
* Data Preprocessing for Time Series Models
* Calculating Age


### **Task3. Install 2 Libraries each for Python and R**

In [None]:
import numpy as np
import pandas as pd
import scipy as sc
import tensorflow as tf
import torch as pt

In [None]:
install.packages("stringr")
install.packages("dplyr")
install.packages("tidyr")
install.packages("readr")
install.packages("lubridate")

### **Task4. Perform simple experiments on the 2 identified libraries each for Python and R**

**Libraries in Python**

**Numpy**



In [None]:
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
print("Array1: ", array1)
print("Array2: ", array2)
print("Resultant Array: ",np.add(array1, array2))

Array1:  [1 2 3]
Array2:  [4 5 6]
Resultant Array:  [5 7 9]


In [None]:
a = np.arange(12).reshape((3,4))
print("Original Array: ",a)
b = [7, 5, 4, 3]
c = np.append(a,b)
print("New Array: ", c)

Original Array:  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
New Array:  [ 0  1  2  3  4  5  6  7  8  9 10 11  7  5  4  3]


In [None]:
#Find min and max in array
min_val = np.min(c)
max_val = np.max(c)

print("Minimum value: ",min_val)
print("Maximum value: ",max_val)

Minimum value:  0
Maximum value:  11


**Scipy**

In [None]:
#dataset link: https://drive.google.com/file/d/1q7qK03njlzZRQ7PyYoprn12uVnE1gjH6/view?pli=1

In [None]:
import pandas as pd

df = pd.read_csv('/content/sample_data/sample_dataset_Exp1.csv')

In [None]:
df.head()

Unnamed: 0,Name,Age,City,State,DOB,Gender,City temperature,Salary
0,Alam,29,Indore,Madhya Pradesh,20-11-1991,Male,35.5,50000
1,Rohit,23,New Delhi,Delhi,19-09-1997,Male,39.0,85000
2,Bimla,35,Rohtak,Haryana,09-01-1985,Female,39.7,20000
3,Rahul,25,Kolkata,West Bengal,19-09-1995,Male,36.5,40000
4,Chaman,32,Chennai,Tamil Nadu,12-03-1988,Male,41.1,65000


In [None]:
df['State'].value_counts()

Haryana           3
Delhi             2
Madhya Pradesh    1
West Bengal       1
Tamil Nadu        1
Bihar             1
Name: State, dtype: int64

In [None]:
df.sort_values(by='Name', inplace=True)
df

Unnamed: 0,Name,Age,City,State,DOB,Gender,City temperature,Salary
0,Alam,29,Indore,Madhya Pradesh,20-11-1991,Male,35.5,50000
2,Bimla,35,Rohtak,Haryana,09-01-1985,Female,39.7,20000
4,Chaman,32,Chennai,Tamil Nadu,12-03-1988,Male,41.1,65000
6,Charu,29,New Delhi,Delhi,18-03-1992,Female,39.0,52000
7,Ganesh,39,Patna,Bihar,07-12-1981,Male,,18000
3,Rahul,25,Kolkata,West Bengal,19-09-1995,Male,36.5,40000
1,Rohit,23,New Delhi,Delhi,19-09-1997,Male,39.0,85000
5,Vivek,38,Gurugram,Haryana,22-06-1982,Male,38.9,35000
8,Vivek,38,Gurugram,Haryana,22-06-1982,Male,38.9,35000


In [None]:
df.iloc[:9, 2:4]

Unnamed: 0,City,State
0,Indore,Madhya Pradesh
2,Rohtak,Haryana
4,Chennai,Tamil Nadu
6,New Delhi,Delhi
7,Patna,Bihar
3,Kolkata,West Bengal
1,New Delhi,Delhi
5,Gurugram,Haryana
8,Gurugram,Haryana


In [None]:
df.loc[[3, 0, 2, 4], ['Name', 'City', "DOB"]]

Unnamed: 0,Name,City,DOB
3,Rahul,Kolkata,19-09-1995
0,Alam,Indore,20-11-1991
2,Bimla,Rohtak,09-01-1985
4,Chaman,Chennai,12-03-1988


**Libraries in R**

**dplyr**

In [None]:
library('dplyr')
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




Unnamed: 0_level_0,id,name,gender,dob,state
Unnamed: 0_level_1,<dbl>,<chr>,<chr>,<date>,<chr>
r1,10,sai,M,1990-10-02,CA
r2,11,ram,M,1981-03-24,NY
r3,12,deepika,F,1987-06-14,
r4,13,sahithi,F,1985-08-16,
r5,14,kumar,M,1995-03-02,DC
r6,15,scott,M,1991-06-21,DW
r7,16,Don,M,1986-03-24,AZ
r8,17,Lin,F,1990-08-26,PH


In [None]:
# filter
df %>% filter(state %in% c("CA", "AZ", "PH"))

Unnamed: 0_level_0,id,name,gender,dob,state
Unnamed: 0_level_1,<dbl>,<chr>,<chr>,<date>,<chr>
r1,10,sai,M,1990-10-02,CA
r7,16,Don,M,1986-03-24,AZ
r8,17,Lin,F,1990-08-26,PH


In [None]:
#slice: select rows by range
df %>% slice(2:6)

Unnamed: 0_level_0,id,name,gender,dob,state
Unnamed: 0_level_1,<dbl>,<chr>,<chr>,<date>,<chr>
r2,11,ram,M,1981-03-24,NY
r3,12,deepika,F,1987-06-14,
r4,13,sahithi,F,1985-08-16,
r5,14,kumar,M,1995-03-02,DC
r6,15,scott,M,1991-06-21,DW


In [None]:
rows_with_na <- df %>% filter(is.na(state))
print(rows_with_na)

   id    name gender        dob state
r3 12 deepika      F 1987-06-14  <NA>
r4 13 sahithi      F 1985-08-16  <NA>


**stringr**

In [None]:
library(stringr)
x <- c("house", "car", "plant", "telephone", "arm chair")
print(x)

[1] "house"     "car"       "plant"     "telephone" "arm chair"


In [None]:
str_sub(x, start = 2, end = 5)
str_detect(x, "ar")
str_count(x, "a")
str_to_title(x)