# **Time Series**

A **time series** is a serially sequenced set of values representing a variable value at different points in time (VanLear, “Time Series Analysis”). It consists in measures collected through time, at **regular time intervals**, about an unit of observation, resulting in a set of ordered values. This regularity is the **frequency** of time series (which can be, for instance, hourly, weekly, monthly, quarterly, yearly etc.).

Time series data are different from cross-sectional data, which are set of data observed on a sample of units taken at a given point in time, or where the time dimension is not relevant and can be ignored. Cross-sectional data are a **snapshot** of a population of interest at one particular point in time, while time series show the **dynamical** evolution of a variable over time. Panel data combine cross-sectional and time series data by observing the same units over time.

**Time** is a fundamental variable in time series. It is often not relevant in other types of statistical analyses. Also from a sociological perspective (and psychological as well), we can see that past events influence future behaviors. Oftentimes, we can make reasonable prediction about future social behaviors just by observing past behaviors. Actually, social reproduction of behaviors over time and predictability of future social behaviors based on past experience and shared knowledge are essential to social order, and thus, a fundamental dimension of human society.

From a statistical perspective, the impact of time resulting from repeated measurements over time on a single subject or unit, introduce a dependency among data points which prevents the use of some of the most common statistical techniques. In cross-sectional data, observations are assumed to be independent: values observed on one unit has no influence on values observed on other units. Time series observations have a different nature: a time series is not a collection of independent observations, or observations taken on independent units, but a collection of successive observations on the same unit. Observations are not taken across units at the same time (or without regards to time), but across time on the same unit.

When dealing with time series data, time is an important factor to be taken into account. It introduces a new dimension to the data. For instance, we can calculate how a variable increases or decreases over time, if it peaks at a given moment in time, or at regular intervals. We consider not just if, and how much, a variable is correlated with another variable, but if there is a correlation over time among them, if the peaks in one variable precedes the peaks in the other one, or how much time it requires for a variable to have an impact on another one, and how much this impact changes over time.

Importantly, when dealing with time series data, we have to to acknowledge that sampling adjacent points in time introduces a **correlation in the data**. This **serial dependency** creates correlated errors which violates the assumptions of many traditional statistical analyses and can bias the estimation of error for confidence intervals or significance tests. This characteristic of time series data, in general, precludes the use of common statistical approaches such as linear regression and correlation analysis, which assume the observations to be independent.

Due to the peculiarity of time series data, time series analysis has been developed as a specific statistical methodology appropriate for the analysis of time-dependent data. Time series analysis aims at providing an understanding of the underlying processes and patterns of change over time of a unit of observation and the relations between variables observed over time, handling the time structure of the data in a proper way.

# **Time Series Analysis**

Time series analysis is an approach employed in many disciplines. Almost every field of study has data characterized by a time development, and every phenomenon with a temporal dimension can be conceived as a time series, and can be analyzed through time series analysis methods. Time series analysis are an important part of data analysis in disciplines such as economics, to analyze, for instance, inflation trends, marketing to analyze the number of clients of a store or number of accesses to an e-commerce website, in demography to study the growth of national population overtime or trends in population ageing, in engineering to analyze radio frequencies, in neurology to analyze brain waves detected through electroencephalograms. Political science can be interested in studying patterns in alternation of political parties in government, and digital communication can be interested in using time series analysis to study series of tweets using an hashtag, the news media coverage on a certain topic, or the trends in users searches on search engines, such as those provided by Google Trends.

In general, we can distinguish at least the following objectives of a time series analysis study:

* DESCRIPTION: Description of a process characterized by an intrinsic temporal dimension. Simple examples of related questions are: is there an upward trend? Is there a peak at a certain point in time? Is there a regular pattern recurring every year, in a particular moment in time? Descriptive questions like these can be answered via descriptive time series analysis.
* EVALUATION: Evaluation of the impact of a certain event, occurring in a particular point in time, on a process. For instance: did a change in social media moderation policy, such as those that led to ban accounts linked to conspiracy theories, impacted on the quantity of fake news shared online by users? Specific time series techniques can be used to perform this kind of analysis.
* EXPLANATION: Explanation of a phenomenon characterized by a time series structure on the basis of related variables. For instance: does the quantity of news shared on Facebook help explaining the polarization of the debate online? Does the volume of news media articles on a topic help explaining the growth of the debate online on the same topic? Inferential statistical techniques, such as regression models developed for time series, are used to answer questions like these.
* FORECASTING: Prediction of the future values of a process. For instance: can we expect that news media coverage on a certain topic keep growing in the near future? This is the subject of time series forecasting.

We can also distinguish between univariate and multivariate time series analysis. Time series analysis can be used to explain the temporal dependencies within and between processes. By temporal dependency within a social process, we mean that the current value of a variable is, in part, a function of previous values of that same variable. To analyze univariate structure of time series, univariate techniques are used. Temporal dependency between social processes, conversely, indicates that the current value of a variable is in part a function of the previous values of other variables. Multivariate time series analysis are used to explain the relations between time series.

#  **Stochastic and Deterministic Processes**

A general distinction can be made between time series, based on their deterministic or non-deterministic nature.

A deterministic time series is one which can be explicitly expressed by an analytic expression. It has no random or probabilistic parts. It is always possible to exactly predict its future behavior, and state how it behaved in the past. Deterministic processes are pretty rare when dealing with individual and social behaviors! Predicting future behaviors of a crowd, of a person, of a social group, can be reasonably possible, sometimes, based on past behaviors and other contextual information, since human behavior is partly influenced by the past. However, it is not totally determined by the past. There is always a certain degree of uncertainty in the prediction; human behaviors are, generally speaking, not fully predictable.

Social and individual behaviors, therefore, are non-deterministic. A non-deterministic time series cannot be fully described by an analytic expression. It has some random, or probabilistic component, that prevents its behavior from being explicitly described. It could be possible to say, in probabilistic terms, what its future behavior might be. However, there is always a residual, unpredictable, component. A time series may be considered non-deterministic also because all the information necessary to describe it explicitly is not available, although it might be in principle, or because the nature of the generating process, or part of it, is inherently random. We can say that the time series analyzed in social science have always, at least, a stochastic component that makes them not totally deterministic.

Since non-deterministic time series have a random component, they follow probabilistic rather than deterministic laws. Random data are not defined by explicit mathematical relations, but rather in statistical terms, that is, by probability distributions and parameters such as mean and variance. Non-deterministic time series can be analyzed by assuming that they are manifestations of probabilistic or stochastic processes.