**NLP**

**Write a program of text processing**

In [46]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

In [57]:
# Sample text
text = "NLP is a fascinating field that combines computer science and linguistics!"
print(text)

NLP is a fascinating field that combines computer science and linguistics!


In [51]:
# Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)

Tokens: ['NLP', 'is', 'a', 'fascinating', 'field', 'that', 'combines', 'computer', 'science', 'and', 'linguistics', '!']


In [54]:
# Remove stopwords
nltk.download('stopwords')
stop_words = set(stopwords.words("english"))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
print(filtered_tokens)

['NLP', 'fascinating', 'field', 'combines', 'computer', 'science', 'linguistics', '!']


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [55]:
# Stemming
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in filtered_tokens]
print(stemmed_words)

['nlp', 'fascin', 'field', 'combin', 'comput', 'scienc', 'linguist', '!']


**Write a program to implement NLP based upon spacy**

In [56]:
import spacy

In [61]:
# Load spaCy's pre-trained model
nlp = spacy.load("en_core_web_sm")

In [66]:
data  ="Apple is looking to buy a startup based in San Francisco for $1 billion."

In [65]:
#process the data
doc = nlp(data)
print(doc)

Apple is looking to buy a startup based in San Francisco for $1 billion.


In [68]:
# Tokenization and stopword removal
tokens = [token.text for token in doc if not token.is_stop]
print(tokens)

['Apple', 'looking', 'buy', 'startup', 'based', 'San', 'Francisco', '$', '1', 'billion', '.']


In [70]:
# Named Entity Recognition (NER)
entities = [(ent.text, ent.label_) for ent in doc.ents]
print(entities)

[('Apple', 'ORG'), ('San Francisco', 'GPE'), ('$1 billion', 'MONEY')]


**Statistics**

-> **Difference b/n descriptive and inferential statistics**

**Definition**

* Descriptive Statistics: Involves methods for summarizing and organizing data to describe its main features.

* Inferential Statistics: Involves techniques that allow us to make generalizations or predictions about a population based on sample data.

**Purpose**

* Descriptive Statistics: Aims to provide a clear and concise summary of the data.

* Inferential Statistics: Aims to draw conclusions and make predictions about a larger population based on a sample.

**Scope of Data**

* Descriptive Statistics: Deals with the actual data collected (e.g., survey results, test scores).

* Inferential Statistics: Deals with sample data to infer characteristics of a larger population (e.g., predicting election outcomes).

**Data Presentation**

* Descriptive Statistics: Utilizes graphs, charts, and summary statistics (e.g., mean, median, mode) to present data.

* Inferential Statistics: Often involves confidence intervals, hypothesis tests, and regression models to present findings.

**Sample vs. Population**

* Descriptive Statistics: Can be applied to both populations and samples, providing summaries of the data collected.

* Inferential Statistics: Primarily focuses on samples to make inferences about the larger population from which the sample is drawn.

**Outcome**

* Descriptive Statistics: Results are strictly descriptive and do not make predictions or generalizations.

* Inferential Statistics: Results can lead to conclusions that extend beyond the sample to the population.

**Complexity**

* Descriptive Statistics: Generally simpler and easier to compute and interpret.

* Inferential Statistics: More complex, often requiring a deeper understanding of statistical theory and methods.

**Assumptions**
* Descriptive Statistics: Does not require assumptions about the distribution of data.

* Inferential Statistics: Often relies on assumptions regarding the population distribution (e.g., normality).

**Examples**

* Descriptive Statistics: Calculating the average age of participants in a study.

* Inferential Statistics: Estimating the average age of all participants in a city based on a sample.

**Tools and Techniques**

* Descriptive Statistics: Involves basic statistical tools and software (e.g., Excel, basic calculators).

* Inferential Statistics: Utilizes more advanced statistical software (e.g., R, Python) for analysis.

**Interpretation**

* Descriptive Statistics: Interpretation is straightforward and relates directly to the data collected.

* Inferential Statistics: Requires careful interpretation, as it involves uncertainty and the potential for sampling error.

**Applications**

* Descriptive Statistics: Used in reporting and summarizing findings in research papers, surveys, and business reports.
* Inferential Statistics: Used in research to test hypotheses and make predictions about future events or trends.


**Decision-Making**

* Descriptive Statistics: Helps in understanding data trends, distributions, and patterns.

* Inferential Statistics: Aids in decision-making by providing evidence to support conclusions about the population.


