## **Title: Getting introduced to data analytics libraries in Python and R**

Lab Objectives: To effectively use libraries for data analytics.

Lab Outcomes (LO): Explore various data analytics Libraries in R and Python. (LO1)

## **Python Libraries**



**PyTorch-** PyTorch is an open-source machine learning library for Python that is widely used for developing and training deep learning models.

PyTorch provides two main features:
>  An n-dimensional Tensor, similar to numpy but can run on GPUs.
> Automatic differentiation for building and training neural networks

 PyTorch has gained popularity for its dynamic computational graph, which allows for more flexibility and ease of debugging compared to static graph frameworks. One of its notable features is the ability to perform dynamic computation during runtime, making it well-suited for tasks like natural language processing, computer vision, and other deep learning applications.

**Scrapy-** Scrapy is an open-source Python framework whose goal is to make web scraping easier. You can build robust and scalable spiders with its comprehensive set of built-in features like:-
> HTTP connections.

> Support for CSS Selectors and XPath expressions.

> Cookie and session management.

> Support for automated retries.

> Concurrency management.

> Built-in crawling capabilities.

> JavaScript rendering with Splash.

**Tensorflow-** TensorFlow is an open-source, end-to-end  machine learning library. It provides a comprehensive set of tools and resources for various machine learning tasks, including neural networks, natural language processing, and computer vision.

It supports the following:
> Multidimensional-array based numeric computation

> GPU and distributed processing

> Automatic differentiation

> Model construction, training, and export

**Keras-** Keras is an open-source high-level neural networks API written in Python. It is a user-friendly and efficient tool for building deep learning models.TensorFlow adopted Keras as its official high-level API, and Keras is now tightly integrated into the TensorFlow core library.

**PyBrain-** Pybrain is an open-source, modular library for machine learning implemented using python.

The following are the features of Pybrain:-
> Pybrain supports neural networks like Feed-Forward Network, Recurrent Network, etc.

> Pybrain supports various datasets to test, validate and train on networks.

> Pybrain trains its network based on the training data given to it using trainers like BackpropTrainer and TrainUntilConvergence

> Pybrain can work with other frameworks like Mathplotlib, pyplot to visualize the data.

## **R Libraries**

**tidyr**- The goal of tidyr is to help you create tidy data. Tidy data is data where:

1)Every column is a variable.

2)Every row is an observation.

3)Every cell is a single value.

Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. If you ensure that your data is tidy, you’ll spend less time fighting with the tools and more time working on your analysis.

**dyplr-** The dplyr is a powerful R-package to manipulate, clean and summarize unstructured data. It makes data exploration and data manipulation easy and fast in R. The package "dplyr" comprises many functions that perform mostly used data manipulation operations such as applying filter, selecting specific columns, sorting data, adding or deleting columns and aggregating data.


**readr-** The goal of readr is to provide a fast and friendly way to read rectangular data from delimited files, such as comma-separated values (CSV) and tab-separated values (TSV). It is designed to parse many types of data found in the wild, while providing an informative problem report when parsing leads to unexpected results.

**stringr-** The vast majority of stringr functions work with patterns. These are parameterised by the task they perform and the types of patterns they match.Functions in stringr:


> Character manipulation functions allow you to manipulate individual characters within the strings in character vectors

> Whitespace tools to add, remove, and manipulate whitespace.

> Locale sensitive operations whose operations will vary from locale to locale.

> Pattern matching functions recognise four engines of pattern description.





**jsonlite-** The jsonlite package is a JSON parser/generator optimized for the web. Its main strength is that it implements a bidirectional mapping between JSON data and the most important R data types. Thereby we can convert between R objects and JSON without loss of type or information, and without the need for any manual data munging. This is ideal for interacting with web APIs, or to build pipelines where data structures seamlessly flow in and out of R using JSON.

## **Code(Python):**

In [None]:
!pip install numpy
!pip install scikit-learn



In [None]:
import pandas as pd
import sklearn.datasets as DATA

In [None]:
data = DATA.load_iris()
df = pd.DataFrame(data.data,columns=data.feature_names)

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
30                 4.8               3.1                1.6               0.2
20                 5.4               3.4                1.7               0.2
129                7.2               3.0                5.8               1.6
42                 4.4               3.2                1.3               0.2
27                 5.2               3.5                1.5               0.2
34                 4.9               3.1                1.5               0.2
44                 5.1               3.8                1.9               0.4
13                 4.3               3.0                1.1               0.1
54                 6.5               2.8                4.6               1.5
70                 5.9               3.2                4.8               1.8


### **Output**

In [None]:
print(df.sample(n=10))

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
44                 5.1               3.8                1.9               0.4
49                 5.0               3.3                1.4               0.2
77                 6.7               3.0                5.0               1.7
118                7.7               2.6                6.9               2.3
120                6.9               3.2                5.7               2.3
68                 6.2               2.2                4.5               1.5
83                 6.0               2.7                5.1               1.6
72                 6.3               2.5                4.9               1.5
26                 5.0               3.4                1.6               0.4
6                  4.6               3.4                1.4               0.3


## **Code(R)**

In [None]:
install.packages ("tidyverse")


Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



In [None]:
library(dplyr)
library(tidyr)

library(tidyr)
n = 10
tidy_dataframe = data.frame(
                      S.No = c(1:n),
                    Group.1 = c(23, 345, 76, 212, 88,
                                199, 72, 35, 90, 265),
                    Group.2 = c(117, 89, 66, 334, 90,
                               101, 178, 233, 45, 200),
                    Group.3 = c(29, 101, 239, 289, 176,
                                320, 89, 109, 199, 56))

In [None]:
d <- data.frame(name = c("Abhi", "Bhavesh", "Chaman", "Dimri"),
				age = c(7, 5, 9, 16),
				ht = c(46, NA, NA, 69),
				school = c("yes", "yes", "no", "no"))




### **Output:**

In [None]:
print(tidy_dataframe)

print(d)
rows_with_na <- d %>% filter(is.na(ht))

print(rows_with_na)

rows_without_na <- d %>% filter(!is.na(ht))
print(rows_without_na)

   S.No Group.1 Group.2 Group.3
1     1      23     117      29
2     2     345      89     101
3     3      76      66     239
4     4     212     334     289
5     5      88      90     176
6     6     199     101     320
7     7      72     178      89
8     8      35     233     109
9     9      90      45     199
10   10     265     200      56
     name age ht school
1    Abhi   7 46    yes
2 Bhavesh   5 NA    yes
3  Chaman   9 NA     no
4   Dimri  16 69     no
     name age ht school
1 Bhavesh   5 NA    yes
2  Chaman   9 NA     no
   name age ht school
1  Abhi   7 46    yes
2 Dimri  16 69     no


## **Conclusion**

We have learned about fundamental data visualization and analytics libraries in both R and Python to gain insights into the capabilities and user-friendliness of these tools.