# Low-Level Analysis
Jana Bruses | janabruses@pitt.edu | University of Pittsburgh | Apr. 3rd - ...

This analysis aims to quantify the linguistic markers of Catalan language substitution by measuring specific changes over time. The following exploration looks at lower-level linguistic symptoms (morphological, syntactic, lexical, and semantic). For higher-level text characteristics that could indicate increasing convergence with Spanish check notebook ...[link]()\
The data colection and dataframe building process used for this analysis can be found in [Data-Parsing-Exploratory-Analysis-2](https://github.com/Data-Science-for-Linguists-2025/Linguistic-Markers-Catalan-Substitution/blob/main/Data-Parsing-Exploratory-Analysis-2.ipynb)

In [13]:
# loading libraries
import pandas as pd

In [14]:
# loading the pickled complete dataframe 
tokscomplete_df = pd.read_pickle("tokcomplete_df.pkl")

In [15]:
# keeping only the columns we are interested in
tokscomplete_df = tokscomplete_df[["Year", "Line_id", "Text", "Text_len", "toks", "Len_toks"]]

In [37]:
# quick recap data on the dataframe
print("There are", tokscomplete_df.shape, "pieces of text")
print("The total of tokens in the dataframe is:", tokscomplete_df["Len_toks"].sum())
print("The oldest text is from", tokscomplete_df["Year"].min())
print("The most recent text is from", tokscomplete_df["Year"].max())

There are (75480, 6) pieces of text
The total of tokens in the dataframe is: 3063943
The oldest text is from 1860
The most recent text is from 2022


In [38]:
tokscomplete_df.describe()

Unnamed: 0,Year,Text_len,Len_toks
count,75480.0,75480.0,75480.0
mean,2008.832658,187.739043,40.59278
std,3.064608,611.35814,122.724208
min,1860.0,0.0,0.0
25%,2008.0,38.0,9.0
50%,2009.0,95.0,21.0
75%,2010.0,222.0,48.0
max,2022.0,73881.0,14727.0


**Comment:**\
Whatever about the desribe

## 1. Loss of word classes - pronoms febles

Following Dr. Junyent's idea that one of the key warnings of a language endangerement's is the loss of word classes we will conduct a frequency exploration of Pronoms Febles as our target word class. Specifically we will take a close look at **"en"** and **"hi"** pronouns that can't be found in Spanish and might be dropped to copy Spanish's structures.\
The following examples and explanations have been inspired by, guided and translated from [els-pronoms-son-vida](https://www.vilaweb.cat/noticies/els-pronoms-son-vida/), a catalan page that does an amazing job at presenting these cornerstones of Catalan.

### **"en"** & **"hi"**

A bit of grammatical background on "en" and "hi".\
These pronouns substitute complements of the verb.

**"en"**\
**(a)** Used to replace complements introduced by “de”\
Example:\
Viuen de la por de la gent. ---> En viuen\
*They live from peoples' fears. ---> They live from them.*\
"en" = bundle of: they, from and people's fears

**(b)** Used to replace direct objects without an article\
Example:\
He comprat masses coses. En tornaré dues.\
*I've bought too many things. I'll return two of them.*\
"en" = of the things I bought\
"masses coses" is the direct object, without any article preciding it "he comprat" is the verbal form

**"hi"**
It is used to replace a complement starting with any other preposition except from "de"\
Demà anem al pis. ---> Demà hi anem\
*We are going to the flat tomorrow ---> We are going "to it" tomorrow
"hi" = al pis/*the flat*

These are examples of some of the most common situations in which these pronouns are being forgotten:\
* He redactat totes les cartes. T’envio dues -> Te n’envio dues
* Sé que t’esperen a la piscina, però no pots anar -> no hi pots anar
* L’espien i ell no s’adona -> no se n’adona
* Demà hi ha la festa, però jo no estaré -> no hi seré
* Sí que està, però ara no es pot posar -> Sí que hi és, però ara no s’hi pot posar
* Avui pot haver una desgràcia -> pot haver-hi una desgràcia

Less common examples are:
No tenim constància de la petició, però avui mirarem de saber alguna cosa ---> saber-ne alguna cosa\
He llegit el text, però no he canviat res ---> no hi he canviat res\
Si et sembla bé, parlem-ho demà ---> parlem-ne demà\
Volia pa, però no vaig pensar a comprar-lo ---> a comprar-ne\
És la bicicleta, o bici, digueu-li com vulgueu ---> digueu-ne com vulgueu\
A sota de cada objecte, escriviu el nom que li correspongui ---> que hi correspongui

Possessive sentences that we are building from Spanish influence instead of using a pronoun:\
Quan va veure aquell automòbil es va enamorar d’ell ---> se’n va enamorar\
Vam trobar una botiga i ens vam arrecerar en ella ---> ens hi vam arrecerar\
Aquell pintor? No recordo el seu nom ---> No en recordo el nom\
Va conèixer Fuster el 1970 i es va començar a relacionar amb aquest ---> s’hi va començar a relacionar\
Diu que s’ha limitat a escoltar les propostes, però no fa cap valoració al respecte ---> no en fa cap valoració

Looking at these examples there are some elements that we can scan the text for.
| Pronoms Febles| |
|----|----|
| "en" | other forms: "n'", "-ne", "se'n"|
| "hi" | can also be found as "-hi" |