#### why do pandas treat strings as objects

#### Why does Pandas treat strings as objects?

Reason 1:
Pandas is built on top of NumPy. NumPy uses fixed-size data types, including
fixed-width string dtypes (`U`, `S`). If a string exceeds the defined width,
it gets truncated, which can lead to data loss.

Reason 2:
Python strings are variable-length objects. Each string is stored separately
in memory according to its size. To support variable-length strings safely,
Pandas stores references to Python string objects using the `object` dtype.


In [2]:
import numpy as np
import pandas as pd

In [3]:
# How NumPy has fixed-size string dtypes
# Here we explicitly specify the width as U5.
# Any string longer than 5 characters will be truncated.

numpy_arr = np.array(['ant', 'elephant', 'encyclopedia'], dtype='U5')
print(numpy_arr[0])
print(numpy_arr[1])
print(numpy_arr[2])


ant
eleph
encyc


In [4]:
## but this is not case with pandas
## because pandas treats the strings as objects , irrespective of what dtype we pass
## Here there will be no truncated values
## this is because the elements or items here will be stored at some memory location as per their requirement
## and when we are trying to access them we get the address of them which references to those memory locations where they are stored
## this is because of the python variable string length concept
pandas_series = pd.Series(['ant' , 'elephant' , 'encyclopedia'] , dtype = 'U5')
print(pandas_series[0])
print(pandas_series[1])
print(pandas_series[2])

ant
elephant
encyclopedia


In [6]:
# What happens when we do NOT specify dtype in NumPy
# NumPy inspects all strings and sets the width equal to the longest string.

numpy_arr = np.array(['ant', 'elephant', 'encyclopedia'])
print(numpy_arr[0])  # width = 12
print(numpy_arr[1])  # width = 12
print(numpy_arr[2])  # width = 12

# Assigning a longer string AFTER creation causes truncation
numpy_arr[0] = 'fsdlfhslfhslfhdskfhslfhsldk'
print(numpy_arr[0])  # truncated to first 12 characters


ant
elephant
encyclopedia
fsdlfhslfhsl
