## 🤗 Finetune **Longformer Encoder-Decoder (LED)** on 8K Tokens 🤗

The *Longformer Encoder-Decoder (LED)* was recently added as an extension to [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.

In this notebook we will finetune *LED* for Summarization on [Pubmed](https://huggingface.co/datasets/viewer/?dataset=scientific_papers). *Pubmed* is a long-range summarization dataset, which makes it a good candidate for LED. LED will be finetuned up to an input length of 8K tokens on a single GPU.

We will leverage 🤗`Seq2SeqTrainer`, gradient checkpointing and as usual 🤗`datasets`.

First, let's try to get a GPU with at least 15GB RAM.

In [None]:
# crash colab to get more RAM
# !kill -9 -1

To check that we are having enough RAM we can run the following command.
If the randomely allocated GPU is too small, the above cells can be run
to crash the notebook hoping to get a better GPU.

Next, we install 🤗Transformers, 🤗Datasets, and `rouge_score`.



In [None]:
%%capture
!pip install datasets
!pip install transformers
!pip install rouge_score

Let's start by loading and preprocessing the dataset.



In [None]:
from datasets import load_dataset, load_metric

Next, we download the pubmed train and validation dataset ([click to see on 🤗Datasets Hub](https://huggingface.co/datasets/scientific_papers)). This can take a couple of minutes **☕** .

In [None]:

tr_dataset = load_dataset("scientific_papers", "arxiv", split="train")
train_dataset = tr_dataset.select(range(400))

Downloading builder script:   0%|          | 0.00/5.35k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/4.99k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/8.27k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/3.62G [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/880M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/203037 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/6436 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/6440 [00:00<?, ? examples/s]

In [None]:
v_dataset = load_dataset("scientific_papers", "arxiv", split="validation")
val_dataset=v_dataset.select(range(100))

It's always a good idea to take a look at some data samples. Let's do that here.

In [None]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=4):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [None]:
show_random_elements(val_dataset)

Unnamed: 0,article,abstract,section_names
0,"there is growing experimental evidence of anisotropic forms of superconductivity in the quasi two - dimensional perovskite oxides .\nenergy gaps of d - wave character have been established for some of the copper oxides that have strongly enhanced antiferromagnetic susceptibilities and high superconducting transition temperatures ( of the order of @xmath1)@xcite . on the other hand ,\np - wave spin - triplet pairing provides a better understanding of the experimental data in the ruthanate @xmath0 that appears to be close to ordering ferromagnetically and becomes superconducting only at low temperature ( of the order of @xmath2 ) @xcite .\nnumerous mechanisms have been proposed for anisotropic superconductivity , especially in the cuprates .\none of the most extensively investigated theoretically is based on a magnetic interaction arising via the exchange of enhanced antiferromagnetic spin - fluctuations @xcite .\nthough not entirely without difficulties , this mechanism correctly anticipated from the beginning the d - wave symmetry of the order parameter observed in some of the copper oxides .\nmoreover , when treated in the mean - field eliashberg theory with full momentum dependence of the electron self - energy , it provided an account of the high transition temperatures in the cuprates , in terms of parameters determined independently from normal state properties alone . in this paper\nwe include the case where , in contrast to the cuprates , a magnetic interaction between electron quasiparticles arises from the exchange of ferromagnetic instead of antiferromagnetic spin - fluctuations in quasi two - dimensional ( 2d ) compounds .\nour calculations differ from those previously reported@xcite for p - wave triplet pairing in the following ways : ( i ) they concern quasi 2d rather than 3d systems , ( ii ) employ a non - parabolic band structure which has potential relevance to real compounds , and ( iii ) make use of the full green s function in place of a simple pole approximation for the propagator .\nthe latter ( iii ) takes a better account of the momentum dependence of the electron self - energy and was found to be important in the nearly antiferromagnetic 2d systems@xcite .\ncomparisons of the mean - field eliashberg equations for nearly ferromagnetic and nearly antiferromagnetic metals with a single 2d fermi surface are presented for a range of parameters defining the magnetic interaction in potentially realistic cases .\nthe results show that the incipient ferromagnets are expected to have p - wave ( spin - triplet ) pairing and transition temperatures that are much lower than in the nearly antiferromagnetic metals for otherwise similar conditions .\na physical interpretation of the numerical analyses is given together with a discussion of the possible relevance of the magnetic interaction model for @xmath0 .\nthe mean - field analysis is intended as a first step toward a more complete treatment of superconductivity in highly correlated electron systems .\nit may also serve as a possible guide to future experiments to test for the existence of magnetically mediated superconductivity in general . the outline of the paper goes as follows .\nin the next section we describe the model and computational method used in this work . in section iii\n, we describe the results of the numerical calculations for both ferromagnetically and antiferromagnetically correlated metals .\nsection iv contains further discussion while our conclusions are presented in the final section .\nwe consider quasiparticles on a two - dimensional square lattice .\nwe assume that the dominant scattering mechanism is of magnetic origin and postulate the following low - energy effective action for the quasiparticles : @xmath3 the spin density @xmath4 is given by @xmath5 where @xmath6 denotes the three pauli matrices .\nthe quasiparticle dispersion relation is @xmath7 with hopping matrix elements t and t. @xmath8 denotes the chemical potential , @xmath9 the inverse temperature , @xmath10 the coupling constant and @xmath11 and @xmath12 are grassmann variables . in the following\nwe shall measure temperatures , frequencies and energies in the same units .\nhaving in mind a possible connection to @xmath0 , we shall model the sheet of the fermi surface of that material thought to be the most relevant for superconductivity@xcite by choosing t=0.45 t . with an average fermi wavevector of @xmath13 and a lattice constant @xmath14\n, luttinger s theorem gives a doping @xmath15 . in the following\n, we shall adopt the value @xmath16 .\nthe fermi surface is shown in fig.(1 ) .\nprevious studies of the dependence of the critical temperature on the ratio t/t and doping level@xcite have shown the relative insensitivity of @xmath17 to small changes in these parameters .\ntherefore , a more realistic description of the fermi surface sheet of @xmath0 is not expected to alter our conclusions .\nwe also note that deviations from the assumed 2d form of the fermi surface sheet is found experimentally to be small .\nour model assumes that the coupling parameter g is constant .\nthe @xmath18 dependence in the simplest case arises from the atomic form factor . for tight binding bands\nthe latter is local in space and this leads to a weak dependence of g on @xmath18 .\nmoreover , near a magnetic instability the dominant @xmath18 dependence of the interaction is expected to arise from @xmath19 , rather than that of g. the retarded generalized magnetic susceptibility @xmath19 that defines the effective interaction , eq .\n( [ seff ] ) , is assumed to take the phenomenological form @xmath20 @xmath21 and @xmath22 are the inverse correlation lengths ( in units of @xmath23 ) with and without of strong magnetic correlations respectively .\nlet @xmath24 in the case of ferromagnetic correlations , the parameters @xmath25 and @xmath26 are defined as @xmath27 where @xmath28 is a characteristic spin - fluctuation temperature .\nwe shall also investigate antiferromagnetic correlations , in which case these parameters take the form @xmath29 the spin - fluctuation propagator on the imaginary axis , @xmath30 is related to the imaginary part of the response function @xmath31 , eq .\n( [ chiml ] ) , via the spectral representation @xmath32 to get @xmath30 to decay as @xmath33 as @xmath34 , as it should , we introduce a cutoff @xmath35 and take @xmath36 for @xmath37 .\na natural choice for the cutoff is @xmath38 .\nwe have checked that our results for the critical temperature are not sensitive to the particular choice of @xmath35 used .\nthe two - dimensional eliashberg equations for the critical temperature @xmath17 in the matsubara representation reduce , for the effective action eq .\n( [ seff ] ) , to @xmath39 @xmath40 @xmath41 { t\over n}\sum_{\omega_n}\sum_{\bf k } \chi({\bf p } - { \bf k},i\omega_n - i\omega_n ) \lambda(t ) & = & 1 \longrightarrow t = t_c \label{gap}\end{aligned}\ ] ] where @xmath42 is the quasiparticle self - energy , @xmath43 the one - particle green s function and @xmath44 the anomalous self - energy .\n@xmath45 is the bare quasiparticle spectrum , eq .\n( [ eps ] ) , @xmath8 the chemical potential that is adjusted to give an electron density of @xmath16 , and @xmath46 the total number of allowed wavevectors in the brillouin zone . in eq .\n( [ gap ] ) , the prefactor @xmath47 is for triplet pairing while the prefactor @xmath48 is appropriate for singlet pairing .\nonly the longitudinal spin - fluctuation mode contributes to the pairing amplitude in the triplet channel and gives rise to an attractive interaction .\nboth transverse and longitudinal spin - fluctuation modes contribute to the pairing amplitude in the singlet channel and give an interaction which is repulsive in reciprocal space with a peak at @xmath49 .\nwhen fourier transformed , such an potential is repulsive on one sublattice ( even sites ) and attractive on the other ( odd sites ) .\nall three modes contribute to the quasiparticle self - energy .\nthe momentum convolutions in eqs .\n( [ sigma],[gap ] ) are carried out with a fast fourier transform algorithm on a @xmath50 lattice .\nthe frequency sums in both the self - energy and linearized gap equations are treated with the renormalization group technique of pao and bickers@xcite .\nwe have kept between 8 and 16 matsubara frequencies at each stage of the renormalization procedure , starting with an initial temperature @xmath51 and cutoff @xmath52 .\nthe renormalization group acceleration technique restricts one to a discrete set of temperatures @xmath53 . the critical temperature at which @xmath54 in eq .\n( [ gap ] ) is determined by linear interpolation .\nthe savings in computer time and memory requirements afforded by this technique allowed us to study a wide range of temperatures and spin - fluctuation spectrum parameters .\nthe dimensionless parameters at our disposal are @xmath55 , @xmath56 , @xmath22 and @xmath21 .\nit is found experimentally that @xmath57 , and we shall use this relation to eliminate one parameter from the set and pick a representative value of the product @xmath58 . a value of @xmath59 corresponds to about @xmath60 for a bandwidth of 1 ev while a value of @xmath61 is representative of what one obtains from a lindhard function with 2d parabolic bands for a fermi momentum of about @xmath62 .\nthe parameters of the model can in principle be inferred from the electronic structure , the dynamical magnetic susceptibility , and the resistivity in the normal state .\nthe resistivity in particular may be used to estimate the dimensionless coupling parameter @xmath55 , the value of which is between 10 and 20 for the simplest rpa approximation for the magnetic interaction potential .\nthe results of our numerical calculations of the mean - field critical temperature @xmath17 in the case of a nearly ferromagnetic metal are shown in figs.(2),(3 ) and ( 4 ) for various values of the characteristic spin - fluctuation temperature @xmath28 .\nwe find an instability for a p - wave gap function @xmath44 transforming as @xmath63 ( or @xmath64 , the two being degenerate for a square lattice ) .\nfigs.(2a),(3a ) and ( 4a ) show @xmath17 versus the dimensionless coupling parameter @xmath55 for several values of the square of the inverse correlation length parameter @xmath65 while figs.(2b ) , ( 3b ) and ( 4b ) show @xmath17 versus @xmath65 for several values of the coupling parameter @xmath55 .\nthe parameter @xmath65 can be varied experimentally by applying pressure to the samples .\nthe @xmath17 versus @xmath65 graphs can be interpreted as @xmath17 versus pressure plots , with the critical pressure corresponding to the quantum critical point at @xmath66 .\nthe critical temperature saturates , in the strong coupling limit , to a value of about @xmath67 for values of @xmath65 of 0.5 to 1.0 . for long correlation lengths\n, @xmath17 decreases . for fixed coupling constant @xmath55\n, we find that the eliashberg renormalization factor @xmath68 increases as @xmath65 decreases and thus pair - breaking effects tend to cancel the stronger attraction as @xmath69 , leading to the reduction of the transition temperature . for short correlation lengths , @xmath17 is reduced as well since in that case the p - wave component of the pairing interaction becomes very small as it is nearly momentum independent for large values of @xmath65\n( 2b),(3b ) and ( 4b ) show that for larger values of the characteristic spin - fluctuation frequency @xmath28 , the critical temperature is more sensitive to changes in @xmath65 . our results for the mean - field transition temperature @xmath17 to a @xmath70 superconducting state\n( @xmath44 transforming as @xmath71 ) for antiferromagnetic spin - fluctuations are shown in fig.(5 ) . comparing with the results diplayed in fig.(3 )\n, one sees that for identical values of the characteristic spin - fluctuation temperature @xmath28 , the d - wave transition temperature saturates to a value of about @xmath72 for values of @xmath65 of 0.5 to 1.0 , a factor of ten or so larger than their p - wave counterparts .\none also observes from figs.(3a ) and ( 5a ) that @xmath17 saturates much more rapidly to its largest value as @xmath73 is increased in the antiferromagnetic case than it does for ferromagnetic spin - fluctuations .\none sees from figs.(3b ) and ( 5b ) that the transition temperature is much less sensitive to changes in @xmath65 in the d - wave case than it is for p - wave superconductivity . as the inverse correlation length @xmath65\nis reduced , the mean - field @xmath17 is much more robust for antiferromagnetic spin - fluctuations than for their ferromagnetic counterparts , indicating that pair - breaking effects are not as damaging in the former case . the eliashberg renormalization factor @xmath74 is shown in figs.(6 ) and ( 7 ) versus wavevector @xmath75 for ferromagnetic and antiferromagnetic spin - fluctuations for @xmath76 .\nthe average of @xmath74 over the fermi surface as a function of @xmath65 is shown in fig.(8 ) for the ferromagnetic case for several values of the coupling parameter @xmath55 .\nwe point out that even in the ferromagnetic case , @xmath77 is strongly anisotropic around the fermi surface when the coupling parameter is small ( fig.(6a ) ) and becomes more isotropic in the strong coupling limit ( fig.(7a ) ) .\nthe anisotropy for small coupling parameter can be understood as a density of states effect , since the smaller fermi velocity near the @xmath78 point can account for a larger value of @xmath77 in this region of the brillouin zone .\nthese effects should matter less in the strong couping limit . on the other hand , for antiferromagnetic spin - fluctuations\n, the anisotropy of @xmath77 increases as the coupling parameter is increased ( see figs.(6b ) and ( 7b ) ) .\nfinally , as shown in fig.(8 ) for nearly ferromagnetic systems , @xmath77 increases rapidly and tends to diverge as the inverse correlation length @xmath79 .\nthe magnetic interaction potential , eqs .\n( [ chiml , gap ] ) is attractive everywhere for the ferromagnetic case , but oscillates in space from attractive ( odd sites ) to repulsive ( even sites ) in nearly antiferromagnetic metals . since the average potential in the latter case tends to cancel , it may seem surprising at first sight that pairing is so much more effective in nearly antiferromagnetic than ferromagnetic metals .\npart of the explanation lies in the fact that the inner product of the spins of two interacting quasiparticles , @xmath80 that enters the pairing potential , is on average three times larger in magnitude for the spin singlet than the spin triplet state for spin @xmath81 particles ( classically the expectation value would of course be the same in both cases ) .\nthus the ferromagnetic interaction potential , though everywhere attractive , is for this reason alone , three times weaker than the antiferromagnetic potential .\none can make this argument more quantitative and solve the eliashberg equations for the nearly ferromagnetic metal assuming only the longitudinal spin - fluctuation mode contributes to the self - energy , setting the coupling parameter @xmath82 in eq .\n( [ sigma ] ) ( the ising case ) .\nthe results of the calculations for a spin - fluctuation temperature @xmath28 equal to two thirds of the nearest neighbor hopping energy @xmath83 are shown in fig.(9 ) and to be compared with the results shown in figs.(3 ) and ( 5 ) .\nwhile the critical temperatures of the nearly ferromagnetic ising metal are much higher than those of the the nearly ferromagnetic one for similar conditions , they do not quite match those of the nearly antiferromagnetic case .\ntherefore , the factor of three in the pairing potential is not the whole story .\nthe extra factor of @xmath84 from landau damping in eq .\n( [ ferro ] ) leads to greater incoherent scattering for a nearly ferromagnetic than antiferromagnetic metal , and hence to a reduced @xmath17 .\nwe have also solved the eliashberg equations for the nearly ferromagnetic ising metal without landau damping ( with @xmath85 , in eq .\n( [ ferro ] ) ) .\nthe results for the same value of @xmath28 are shown in fig.(10 ) .\none might have expected that the ising case without the landau damping would lead to transition temperatures for the purely attractive potential and p - wave pairing that are much higher than from the spatially oscillatory potential and d - wave pairing . that this is not the case , as may be seen by comparing figs.(5 ) and ( 10 ) ,\ncan be understood when one takes into account of the effects of retardation that restricts scattering to states within a narrow range of wavevectors near the fermi surface .\nthis implies that the pair wavefunction tends to oscillate in space with wavevector of the order of @xmath86 and the probability distribution with wavevector @xmath87 , i.e with a wavevector comparable to that of the magnetic interaction potential itself .\nfurthermore , the maxima of the probability appear near the minima of the potential along the square axes , while in the d - wave state , the probability vanishes alltogether along the diagonals where the interaction is everywhere repulsive . in this case\nthe effect of the repulsive regions is small and the gain achieved with a purely attractive potential with otherwise similar properties is not as great as might have naively been suspected . beside this\nthere remains at least one more significant difference between the ferromagnetic and antiferromagnetic cases that may be relevant to pair formation but is not readily quantified . in the latter case\nthe mass renormalization is much more anisotropic than in the former and is strongest at points on the fermi surface ( the hot spots ) connected by the antiferromagnetic wavevector .\nthis anisotropy may lead to strong coupling effects which on the whole are less damaging to pairing than in the corresponding ferromagnetic case where essentially all the points of the fermi surface are equivalent .\ntaken together , these effects confer a very considerable advantage for pairing to the nearly antiferromagnetic versus ferromagnetic metals that have otherwise comparable properties .\nfurther considerations also lead to an advantage of quasi 2d over 3d metals .\nthe average of the spin - fluctuation frequency in the brillouin zone tends to be larger in 2d than in 3d .\nthis favors enhanced incoherent scattering , and hence reduced @xmath17 .\nhowever , it also leads to an enhanced pairing energy and greater robustness against impurities and the effects of competing channels of interactions .\nwe expect that these latter considerations will normally tend to dominate and hence favor quasi 2d over 3d systems , under otherwise similar conditions , and provided that corrections to the mean - field solutions are not important . within the magnetic interaction model in the mean - field approximation ,\nthus , the highest @xmath17 is expected to arise in quasi 2d metals with high @xmath28 and on the border of a continuous antiferromagnetic transition ( when the magnetic correlation wavevector @xmath79 as @xmath88 ) .\ninterestingly these conditions are well satisfied in the copper oxides but much less so in the heavy fermion and organic compounds ( see e.g. refs @xcite and @xcite respectively ) . in the heavy fermions\n@xmath28 happens to be low because the f electrons produce narrow bands , while in the organics @xmath28 is small because the carrier concentration is low .\nthus one expects , and indeed one finds , much lower @xmath17 s in these materials than in the cuprates .\nthe calculations also predict that magnetically mediated superconductivity should be a general phenomenon occuring on the boundary of a continuous magnetic transition , in both ferromagnets and antiferromagnets and in quasi 2d and 3d compounds .\nthis may not be observed in practice , however , due to pair breaking effects of impurities and other interaction channels not considered here explicitly . in cases when the magnetic transistion is not abrupt and @xmath21 can be made arbitrarily small at low temperatures , the magnetic interaction potential may overwhelm these other effects and , at least in the nearly antiferromagnetic case where the mean - field @xmath17 appears to remain finite as @xmath79 , superconductivity may survive in a narrow range of lattice densities near the critical density where the magnetic order is continuously quenched .\nthe magnetic interaction model and the mean - field approximation for @xmath17 might be expected to apply most successfully in nearly magnetic metals where @xmath17 is small compared to the electronic bandwidth and @xmath28 .\nthe nearly ferromagnetic quasi 2d metal @xmath0 that orders in a spin - triplet p - wave state only at very low temperatures ( below @xmath89 ) , @xcite would therefore seem an ideal candidate for comparison between theory and experiment .\nthe calculations presented in this paper provide a first step toward such a comparison .\nthe next will be to build a realistic model of @xmath19 from nmr and neutron scattering measurements , or from the numerical calculations now in progress@xcite .\npreliminary evidence suggests that @xmath0 may be close both to ferromagnetism and antiferromagnetism@xcite .\nthe competition between these two tendencies , along with the comparatively small magnitude of @xmath90 in the observed spin - triplet state and other features as discussed above , may help to account for the much lower @xmath17 in this layer perovskite oxide compared with that of the cuprates .\nwe note that our calculations may be expected to break down when the mass renormalization becomes large at high values of the coupling constant or at small @xmath21 near the critical point for magnetic order .\nalso it should fail when the superconducting coherence length becomes small compared with the average spacing between cooper pairs , i.e. for sufficiently high @xmath17 or in _ strictly _ 2d where there is no true long - range order at finite temperature .\nthe latter condition is not readily reached in many of the known quasi 2d systems .\nfinally , we emphasize that our model for the magnetic interaction does not include any possible spin - gap formation .\nfor this reason alone , it is not expected to apply near the metal - insulator phase boundary in the cuprates@xcite .\nwe have contrasted the predictions for the superconducting transition temperature for magnetically mediated superconductivity for nearly ferromagnetic versus nearly antiferromagnetic metals in quasi 2d .\nthe calculations are based on a single fermi surface sheet , and a conventional form for the magnetic interaction arising from the exchange of spin fluctuations treated in the mean - field eliashberg theory .\nthe dominant @xmath18 and @xmath91 dependence of this interaction is assumed to arise from the dynamical wavevector dependence of the susceptibility , and thus the interaction vertex is taken to be a phenomenological constant . in principle the latter quantities may be inferred independently from inelastic neutron scattering and for example the temperature dependence of the resistivity in the normal state .\nthe mean - field eliashberg theory is expected to break down when , for example , @xmath17 is so high that the superconducting coherence length becomes small and less than the typical spatial separation of cooper pairs .\nit may also fail in the immediate vicinity of the critical density when magnetic order is quenched continuously and the quasiparticle density of states tends to become singular . here\nthe electron quasiparticle framework underpinning the mean - field eliashberg model may break down in an essential and non - trivial fashion . within the range of validity of our calculations\nwe may conclude that , for the same set of dimensionless parameters , the p - wave triplet pairing in nearly ferromagnetic metals is much less robust than the d - wave singlet pairing in the corresponding nearly antiferromagnetic metals . for values of @xmath28 that are typical of d metals in the layered perovskites\n, we predict a maximum of @xmath17 versus @xmath21 of the order of @xmath1 in the latter but typically one or more orders of magnitude less than this in the former .\nthe reasons for the dramatic difference are discussed in section iv .\nthe pair breaking effects of impurities and of competing interaction channels can lead to substantially lower values than the above in real materials .\nthese effects may , however , be mitigated by reducing @xmath21 via some external control parameter such as pressure and hence enhancing the magnetic pairing energy .\nthe calculations are intriguing in the light of the d - wave singlet state observed in the cuprates with strongly enhanced antiferromagnetic susceptibilities and @xmath17 s of the order of @xmath1 , versus the p - wave triplet state found in the ruthanate @xmath0 that is close to ferromagnetic order and has a much lower @xmath17 ( of the order of @xmath2 ) .\nthe maximum of @xmath17 versus @xmath21 in @xmath0 in the triplet state is not yet known , and may well be higher than that measured at ambient pressure .\na more complete description of @xmath0 must await realistic modelling of the dynamical susceptibility which may reflect not only ferromagnetic but also competing antiferromagnetic tendencies .\nthe latter may be subdominant at ambient pressure but may be highly sensitive to lattice spacing .\nit would also be interesting to investigate more closely the effect of the additional fermi surface sheets in @xmath0 .\nan experimental study of the variation of @xmath17 vs @xmath21 in this system , that satisfies to a greater extent than cuprates the condition @xmath92 , would provide a vital test of the theory of magnetically mediated superconductivity .\nsuch a study would be feasible in @xmath0 if it orders ferromagnetically at positive pressure@xcite , and in the isostructural and isoelectronic compounds @xmath93 and @xmath94 that are expected to become similar to @xmath0 at very high pressure .\nfinally we reiterate that our calculations suggests that one should look for elevated transition temperatures in systems in which ( i ) @xmath28 is high , i.e. the electron density is not too low and effective band mass not too high , ( ii ) the lattice or carrier density can be tuned to the vicinity of a magnetic critical point in the metallic state , ( iii ) the electronic structure is quasi 2d rather than 3d , and ( iv ) antiferromagnetism ( or ising ferromagnetism ) is favored over ferromagnetism . a considerable number of candidate materials for further study of the predictions of the magnetic pairing model would seem to be available given current material fabrication and high pressure technology .\nthe experimental investigation of such systems , whether or not they prove to yield high transition temperatures , should help us to improve our understanding of magnetic pairing and perhaps also shed light on the more exotic models@xcite for normal and superconducting states that have been proposed for highly correlated electronic systems .\nwe would like to thank p. coleman , s.r .\njulian , p.b .\nlittlewood , a.j .\nmillis , a.p .\nmackenzie , d. pines , d.j .\nscalapino and m. sigrist for discussions on this and related topics .\nwe acknowledge the support of the epsrc and the royal society .","we compare predictions of the mean - field theory of superconductivity for nearly antiferromagnetic and nearly ferromagnetic metals in two dimensions . \n the calculations are based on a parametrization of the effective interaction arising from the exchange of magnetic fluctuations . \n the eliashberg equations for the transition temperature are solved including the full momentum dependence of the electron self - energy . \n the results show that for comparable parameters d - wave singlet pairing in nearly antiferromagnetic metals is generally much stronger than p - wave triplet pairing in nearly ferromagnetic metals in quasi two dimensions . \n the relevance to the layered materials , and in particular @xmath0 that exhibits p - wave triplet pairing , is discussed .",introduction\nmodel\nresults\ndiscussion\nconclusions\nacknowledgments
1,"let a billiard ball be shot from a corner of a rectangular billiard .\nconsider the ball as a point , and truncate the orbit somewhere at the boundary .\nthe truncated orbit of the ball generates a partition of the rectangular billiard into polygons , similar to figure [ billiard ] .\nmany of these triangles and quadrangles seem to have the same shape and size . in this paper\nwe will show that ( for a fixed shooting angle and stopping point ) the number of different areas is at most thirteen .\nthis universal upper bound is the sharpest possible .\nwe also consider rational shooting angles and irrational shooting angles for which the thirteen is never reached .\n[ billiard ]\nthe results in this paper are closely related to the three gap theorem ( see e.g. @xcite , @xcite ) and the four gap theorem ( see @xcite ) .\nthe statements of these two theorems are best illustrated by a picture ; see figure [ 3gt4gt ] . [ cols= "" > , < "" , ] now do the same starting from the other corners , traversing the square @xmath0 times with a line either with slope @xmath1 or @xmath2 .\nexplicit expressions for these four sets ( one for each corner ) are given in lemma [ lemma_aexpressions ] .\nfor an illustration , see the middle plot in figure [ construction ] .\n+ the key observation now is that intersection of all @xmath3 lines with @xmath4 ^ 2 $ ] gives exactly a truncated billiard orbit with slope @xmath1 , as is proved in lemma [ lemma_orbit ] .\nthis fact is illustrated in the right plot in figure [ construction ] .\nobviously not all @xmath3 lines actually contribute to the billiard orbit .\nhowever , there is a good reason to consider them all : the intercepts of the @xmath5 lines with positive slope form a truncated orbit of a rotation on the interval @xmath6 $ ] , see lemma [ lemma_rotation ] . for the lines with negative slope a similar result holds . having collected these insights\n, a simple counting argument suffices to obtain the upper bounds claimed in theorem [ theorem_13areas ] : + + * proof of theorem [ theorem_13areas ] * lemma [ lemma_orbit ] writes the billiard orbit as an intersection of the square @xmath4 ^ 2 $ ] with a set of lines .\nlet us concentrate on the lines with positive slope .\nby lemma [ lemma_rotation ] the intercepts of these lines form a rotation orbit on the interval @xmath6 $ ] .\nso by property [ prop_rotation ] they induce a partition of this interval in subintervals of _ at most _ three different lengths .\ndenote the set of these lengths by @xmath7 , where @xmath8 .\nfor the lines with negative slope , the intercepts are the numbers @xmath9 , @xmath10 .\nthey induce a partition of @xmath11 $ ] in subintervals having lengths in the same set @xmath12 .\nit now follows that vertical distances between adjacent parallel lines are in the set @xmath12 .\nwe will distinguish between three types of polygons : those that have no side which is part of the boundary of @xmath4 ^ 2 $ ] ( type @xmath13 ) , those that have exactly one such a side ( type @xmath14 ) and those that have two or more ( type @xmath15 ) .\n+ the polygons of type @xmath13 must be parallelograms .\nthe area of such a parallelogram is given by @xmath16 for some @xmath17 , and consequently they can have at most six different areas . +\na polygon of type @xmath14 that is triangular must be half of a rhombus of which the vertical diagonal has length @xmath18 , and therefore its area is @xmath19 .\nthere is at most one non - triangular type @xmath14 polygon , as is explained in figure [ pic_extreme ] .\nso polygons of type @xmath14 can have at most four different areas .\n+ polygons of type @xmath15 must be in one of the corners of @xmath4 ^ 2 $ ] , but not in @xmath20 since the orbit starts there .\nso this gives at most three more areas .\n+ putting everything together , it turns out that the number of different areas is bounded by thirteen .\n+ for the number of shapes a similar counting argument holds .\nthe number of parallelogram shapes is again six , since reflections do not count .\nthe triangles that are half of a rhombus can have at most six different shapes , since there are three types of rhombi which can be cut either horizontally or vertically .\nthe rest of the argument does nt change , so there are at most three more different shapes than different areas , which establishes the upper bound of at most sixteen different shapes .\n+ the sharpness of these bounds follows from example [ ex_13flakes ] in section [ example ] .\n@xmath21 [ remark]as the careful reader may have noted , the construction of the billiard orbit always gives a truncation on the left boundary or on the lower boundary of the square .\nso strictly speaking , theorem [ theorem_13areas ] is not proved in full generality yet .\nsuppose we have an orbit truncated at the upper or right boundary . by removing the last linear part or adding the next linear part\n, we can transform this orbit into an orbit truncated at the left or lower boundary .\nthis means that in the proof above , the rotation orbit on the interval @xmath6 $ ] contains one element more or one less than the rotation orbit on @xmath11 $ ] .\nnow property [ prop_strong ] tells us that vertical distances between adjacent parallel lines can still have at most three different values , completing the proof .\ntheorem [ theorem_13areas ] gives an upper bound for the number of different areas of shapes on the billiard table .\nsome natural questions remain . for example , what happens if @xmath1 is rational ?\ncan we prove sharper upper bounds under suitable conditions ?\nin this section we explore these properties .\n+ obviously , taking @xmath1 rational gives a special case .\nthe first thing to note is that the orbit will be periodic : if @xmath22 , then for @xmath23 @xmath24 a bit less trivial is the following result .\nthe best upper bound for @xmath25}^\alpha$ ] with @xmath26 is @xmath27 , but for all @xmath26 there is an @xmath28 such that @xmath29}^\alpha \leq 3 $ ] for @xmath30 .\nthese bounds are sharp .\n* proof * note that the areas of the polygons continuously depend on @xmath1 .\nso if we have an @xmath31 and @xmath32 such that @xmath25}^{\tilde\alpha } = 13 $ ] , then we can find @xmath33 such that the upper bound of thirteen is reached for all @xmath34 .\nsince this interval contains rationals , we see that rationality is not sufficient for a sharper upper bound .\n+ since the orbit is periodic , the partition does nt change anymore if @xmath32 is large enough .\ntaking @xmath35 shows that @xmath13 is a sharp lower bound for the limiting number of shapes .\nfor the upper bound , suppose that @xmath22 .\nby lemma [ lemma_rotation ] the intercepts satisfy @xmath36 it follows that the numbers @xmath37 form a periodic rotation orbit on @xmath6 $ ] and therefore the set @xmath12 as defined in the proof of theorem [ theorem_13areas ] contains only one length if @xmath32 is large enough .\nif @xmath38 and @xmath39 are relative prime , then this length is @xmath40 . now\na type @xmath13 polygon is a rhombus with area @xmath41 . since there is no endpoint of the orbit anymore , a type @xmath14 polygon is half of such a rhombus .\npolygons in the corners are also triangular , because the orbit touches all sides of the square before becoming periodic . these triangles are quarters of the rhombus , thus having area @xmath42 .\nthis makes at most three different areas in total . to see that this upper bound is sharp ,\nsee figure [ pic_rational].@xmath21 surprisingly , there exist irrational @xmath1 for which the upper bound of thirteen different areas is never reached : let @xmath43 denote the small golden mean .\nif @xmath44 for some @xmath45 , then @xmath25}^\alpha\leq 12 $ ] . * proof * consider the numbers @xmath37 that form a rotation orbit on @xmath6 $ ] .\nthe partition of @xmath6 $ ] induced by this orbit gives subintervals with lengths in a set @xmath12 .\nthis set @xmath12 changes if we extend the orbit ( i.e. we increase @xmath32 ) : some lengths will disappear and new lengths will be created . in @xcite and @xcite\nit was shown that the largest length is always the first to disappear .\na new length only pops up if there are only two lengths in @xmath12 , and the new length is the difference of these two existing lengths .\ntogether with the fact that @xmath46 , this is the basis of our argument .\n+ let @xmath47 .\nfrom the way points are added to the rotation orbit it is clear that we can choose @xmath32 such that @xmath6 $ ] will be partitioned in @xmath48 intervals of length @xmath1 and an interval of length @xmath49 .\nthis gives @xmath50 .\nextending the orbit with one more point transforms @xmath12 into @xmath51 and this is the first time that @xmath12 contains three lengths .\nincreasing @xmath32 further , @xmath12 will change into @xmath52 and then into @xmath53 .\nan inductive argument suffices to show that the ratios between the lengths in @xmath12 are preserved . +\nrecall that the areas of the parallelograms are determined by a product of two lengths in @xmath12 . by the above reasoning , if @xmath54 , then @xmath55 , which implies that the parallelograms can have at most five different areas .\nconsequently @xmath25}^\alpha\leq 12 $ ] .\nlet @xmath56 be an irrational number and consider the halfline @xmath57 .\nlet @xmath58 and define @xmath59 to be the squares of the form @xmath60 , with @xmath61 and @xmath62 integers , that are consecutively traversed by the halfline , see figure [ squares ] .\nchoosing an index @xmath0 , there exists @xmath63 such that @xmath64 taking fractional parts in both coordinates can be seen as mapping each of the squares @xmath65 to @xmath66 .\ntherefore , doing this for the above set gives @xmath67 for numbers @xmath37 defined by the recursion @xmath68 we will denote the set in ( [ a++ ] ) by @xmath69 .\nthe @xmath70 superscript reflects the fact that we started with a halfline in the first quadrant , so both coordinates are positive .\ndoing similar operations to halflines in the second , third and fourth quadrant , we can define sets @xmath71 , @xmath72 and @xmath73 respectively as follows : @xmath74 taking the union of these four sets and intersecting with @xmath4 $ ] gives us a billiard orbit , as is proved in the lemma below .\n[ lemma_orbitopen ] the billiard orbit @xmath75 satisfies @xmath76 ^ 2.\ ] ] * proof * observe that @xmath77 ^ 2\cap\bigcup_{a\in\left\{\left\{x\right\},1-\left\{x\right\}\right\ } } \bigcup_{b\in\left\{\left\{\alpha x\right\},1-\left\{\alpha x\right\}\right\ } } ( a , b),\end{aligned}\ ] ] and now take the union over all @xmath78.@xmath21 + + in the next lemma expressions similar to ( [ a++ ] ) are derived for @xmath71 , @xmath72 and @xmath73 .\n[ lemma_aexpressions ] let @xmath79 for @xmath80 .\nthen @xmath81\times[0,1 ) \cap \bigcup_{k=1}^n \left\{(x,-\alpha x+1-y_{-k}):x\in\mathbb{r}\right\},\\ a^{-- } & = & ( 0,1]^2 \cap \bigcup_{k=1}^n \left\{(x,\alpha x + y_{-k}):x\in\mathbb{r}\right\ } , \\ a^{+- } & = & [ 0,1)\times(0,1 ] \cap \bigcup_{k=1}^n \left\{(x,-\alpha x+1-y_k):x\in\mathbb{r}\right\},\end{aligned}\ ] ] * proof * define the functions @xmath82 by @xmath83 , @xmath84 and @xmath85 . applying these functions to the left hand side of ( [ a++ ] ) , we get @xmath86 , @xmath87 and @xmath88 . on the other hand\n, @xmath89 whence application of @xmath90 to the right hand side of ( [ a++ ] ) leads to @xmath91\times[0,1)\cap \bigcup_{k=1}^n f\bigl(\left\{(x,\alpha x + y_k):x\in\mathbb{r}\right\}\bigr)\\ & = & ( 0,1]\times[0,1)\cap \bigcup_{k=1}^n \left\{(x,-\alpha x + 1-y_{-k}):x\in\mathbb{r}\right\},\end{aligned}\ ] ] so for @xmath71 we established the equality claimed in the lemma .\nthe other two equalities for @xmath72 and @xmath73 follow from a similar reasoning since @xmath92 and @xmath93 @xmath21 the numbers @xmath37 and @xmath94 satisfy a nice relation , as is shown in the following lemma .\n[ lemma_rotation ] let @xmath95 .\nthen the numbers @xmath96 form a rotation orbit on the interval @xmath6 $ ] .\nthey are given by @xmath97 * proof * the recursion ( [ recursion ] ) can be rewritten as @xmath98 and therefore @xmath99 letting @xmath100 and @xmath101 , this reduces to @xmath102 since @xmath103 , we have @xmath104 , which leads to @xmath105 on the other hand , for @xmath106 , @xmath107 since @xmath31 is irrational . by definition\nwe have @xmath108 , and hence @xmath109 solving for @xmath37 gives the result .\n@xmath21 + + in lemma [ lemma_orbitopen ] we already derived an expression for @xmath75 , but this is not so easy to analyze directly .\nin the next lemma we describe @xmath110}^\alpha$ ] as the union of two collections of lines intersected with @xmath4 ^ 2 $ ] .\nall lines in the first collection have slope @xmath1 and all lines in the second collection have slope @xmath2 . [ lemma_orbit ]\nlet @xmath111 and @xmath112 .\nthen @xmath113}^\alpha = [ 0,\frac{1}{2}]^2\cap \bigcup_{u\in\left\{+,-\right\}}\bigcup_{k =- n}^n \left\{(x , l_k^u(x):x\in\mathbb{r})\right\}\ ] ] * proof * this lemma will be proved by taking closures in the equation in lemma [ lemma_orbitopen ] .\n@xmath114}^\alpha,\ ] ] since @xmath115 is a continuous function from @xmath116 to @xmath117 . on the other hand ,\n@xmath118 ^ 2\cap\bigcup_{u , v\in\left\{+,-\right\}}a^{uv } } = [ 0,\frac{1}{2}]^2\cap\bigcup_{u , v\in\left\{+,-\right\}}\overline{a^{uv}},\ ] ] and since @xmath119 is a finite collection of lines intersected by a ` half open ' unit square its closure is the same collection of lines but now intersected by the closed square @xmath120 ^ 2 $ ]\n. therefore , @xmath121 ^ 2\cap\bigcup_{\begin{array}{l } k =- n\\k\neq 0\end{array}}^n\left\{(x , l_k^+(x)):x\in\mathbb{r}\right\}\ ] ] now note that since @xmath122 we have @xmath123 ^ 2\cap\left\{(x , l_0^+(x)):x\in\mathbb{r}\right\ } = \emptyset.\ ] ] intersecting both sides of ( [ + + ] ) with @xmath4 ^ 2 $ ] gives @xmath123 ^ 2\cap\overline{a^{++}}\cup\overline{a^{-- } } = [ 0,\frac{1}{2}]^2\cap\bigcup_{k =- n}^n\left\{(x , l_k^+(x)):x\in\mathbb{r}\right\}\ ] ] analogously it follows that @xmath123 ^ 2\cap\overline{a^{+-}}\cup\overline{a^{-+ } } = [ 0,\frac{1}{2}]^2\cap\bigcup_{k =- n}^n\left\{(x , l_k^-(x)):x\in\mathbb{r}\right\}\ ] ] combination of the last two equations gives the result .\nin this section we present an example in which the upper bounds of theorem [ theorem_13areas ] are reached .\nthis proves sharpness of the bounds . [ ex_13flakes ]\nlet @xmath124 and choose @xmath125 .\nthe corresponding orbit is shown in figure [ pic_13flakes ] .\nuse lemma [ lemma_rotation ] to find the numbers @xmath37 and let @xmath126 denote the three different vertical distances between adjacent parallel lines .\nthe areas of the shapes are of the following form : @xmath127 calculating these thirteen areas indeed gives thirteen different values , where a precision of two decimals suffices .\nthe flakes @xmath128 , @xmath129 and @xmath130 have the same areas as @xmath131 , @xmath132 and @xmath133 respectively , so the maximal number of sixteen different shapes is also reached .\nwe checked the calculations by using the outcomes to determine the area of @xmath4 ^ 2 $ ] .\nthe author thanks michel dekking and cor kraaikamp for their useful comments .","we study the geometry of billiard orbits on rectangular billiards . \n a truncated billiard orbit induces a partition of the rectangle into polygons . \n we prove that thirteen is a sharp upper bound for the number of different areas of these polygons . \n billiard orbit , geometry of partitions 11b75",introduction\nrotations\nrational angles and a golden exception\nlemmata and their proofs\nsharpness of the bounds\nacknowledgement
2,"the diffusion of atoms and molecules on surfaces is of importance in a wide range of research fields and applications , consequently , a wide range of dedicated experimental and theoretical methods have been developed over the years @xcite .\none of these techniques is quasi elastic helium atom scattering ( qhas ) .\nthis method which has received a significant boost with the availability of the helium spin echo ( hse ) apparatus@xcite , provides a unique opportunity to follow atomic scale motions on time scales of pico to nano - seconds . with new data\ncomes the need for new or improved models to interpret the data and extract the underlying physical properties of the surface system .\none commonly used interpretation model is based on a 2d langevin simulation which allows the extraction of a potential energy surface term and a friction parameter from the experimental data\n. one obvious drawback of this model is that the complex dynamics of the substrate atoms are not explicitly treated and correlations between the motion of the adsorbate and substrate particles can not be accounted for .\nanother approach , which to the best of our knowledge was only applied twice to interpret qhas measurements , is using molecular dynamics ( md ) simulations@xcite . in these md simulations , the motion of the surface atoms and their interaction with their neighbors are explicitly calculated and correlation effects between the motion of the adsorbate and the substrate atoms are inherently included . on the down side ,\nthese explicit md simulations are computationally expensive . in this manuscript\nwe describe a numerical study which aims to quantify the differences between these two approaches and probe the validity of using the simpler and less time consuming langevin approach .\nthis comparison is performed by calculating the observables with both simulations under similar conditions .\nthe paper is organized in the following manner , we start by introducing some useful relations and definitions , then we describe the two numerical simulations and explain how they were tuned to simulate similar surface systems and finally we present the results of the comparison .\nsurface diffusion is a general process which describes the motion of particles ranging from atoms to macro molecules which are confined to a surface@xcite . for most systems surface diffusion\nis essentially a classical process - with hydrogen diffusion at low temperatures being an example of an exception @xcite . for molecular adsorbates\na simple and sometimes sufficiently good description can be obtained by ignoring the internal degrees of freedom , although it should be noted that these degrees of freedom can play an important role in some systems @xcite .\nthere are several physical properties which are accessible to experiments and can be used to characterize surface motion , particularly popular choices are the the tracer diffusion coefficient in the case of isolated diffusion and the chemical diffusion coefficient for the case of collective motion .\nanother way of characterizing motion is using pair correlation functions which , unlike the diffusion coefficients mentioned above , contain a full statistical description of the motion and its underlying mechanism .\nthe pair correlation function we use can be written as a sum of the self correlation and the distinct correlation functions , @xmath0 defined by @xcite @xmath1 these correlation functions can be interpreted as a measure of the probability of finding a particle at location @xmath2 at time @xmath3 , given that the same @xmath4 or a different @xmath5 particle was at the origin at time @xmath6 .\nthe self correlation function describes the complete dynamics of an individual adsorbate .\nit is also the dominant contribution for dilute adsorbate coverages . in this work\nwe will focus on the zero coverage limit , neglecting the contribution from the distinct correlation function .\none advantage of using these pair correlation functions is their close relation to quantities which can be measured in experiments .\nin particular fourier transforming @xmath7 to the momentum domain gives @xmath8 , the self part of the intermediate scattering function ( isf ) @xmath9 where @xmath10 is proportional to the momentum exchange parallel to the surface in a scattering experiment . within the kinematic scattering approximation\nit can be shown that the hse technique mentioned above measures this quantity directly@xcite , where the experimentally accessible @xmath10 values range between 0.01 to a few inverse angstroms and the times , @xmath3 , range from 0.1ps to a few nano seconds . using the same scattering approximation\nit can be shown that for a time of flight helium atom scattering apparatus , the observable quantity is the temporal fourier transform of @xmath11 known as the dynamic structure factor ( dsf ) , @xmath12 , where @xmath13 is the energy exchange during the scattering event . generally speaking , when random motion takes place on a surface @xmath11 decays with time reflecting the loss of correlation of the position of the surface particles , a decaying isf corresponds to a peak in the dsf which is centered around @xmath14 and has a width which is inversely proportional to the decay rate .\nthis peak is called the quasi - elastic peak ( qep ) and its width , @xmath15 , is often termed the quasi - elastic broadening . whether one measures the isf ( using hse @xcite ) or the dsf ( using time of flight helium scattering @xcite ) , an analytical or numerical model is needed to extract the physical properties of the surface dynamics from the data .\none particularly useful analytical model for surface diffusion of adsorbates , which can be used to calculate the quasi - elastic broadening , is the jump diffusion model which was first derived by chudley and elliott @xcite for neutron scattering measurements of bulk diffusion . in\nthis model vibrational motions within the adsorption sites ( intra - cell motions ) are ignored and the inter - cell jumps between one adsorption site and another are assumed to be instantaneous .\ngenerally speaking , the chudley elliot model is suited for systems where i ) the energy barrier for diffusion is large compared with the thermal energy ii ) the adsorption sites form a bravais lattice and iii ) the adsorbate coverage is sufficiently small to not be affected by the presence of other adsorbates , i.e. we can ignore the influence of the distinct pair correlation function .\nthe results of this simple model are an isf which decays exponentially with time , @xmath16 which corresponds to a dsf which contains a lorentzian peak centered at @xmath14 with a finite width , @xmath17 . for the case of the chudley - elliot model the dependence of the quasi elastic broadening on the momentum transfer\nis given by @xmath18 where the sum is over of a discrete set of @xmath19 possible jump vectors connecting two adsorption sites and @xmath20 is the jump rate for a particular jump vector .\n( [ eq : jump qhas-1 ] ) contains all the information about the jump diffusion process , hence , the experimental isf allows us to extract the different jump rates .\nfurthermore , if we make the assumption that the jump rate @xmath21 is of the form @xmath22 we can find the potential barrier @xmath23 the adsorbate has to overcome by finding the temperature dependence of the @xmath21 coefficients .\nan analytic approach in general and eq .\n( [ eq : jump qhas-1 ] ) in particular , provides significant insights into the dynamics when analyzing experimental data ( e.g. @xcite ) . on the other hand using equation [ eq : jump qhas-1 ] , which treats the surface as a discrete set of point - like adsorption sites that an isolated point - like particle jumps between , is typically restricted to relatively simple surface dynamics systems where this ideal - jump model is valid . as mentioned above ,\nan alternative approach which has been extensively used in the last few decades is to numerically calculate the trajectories using a set of parameters which describe the various interactions , extract observables such as the dsf or isf from the trajectories and by comparison with the experimental observables improve the interaction parameters until a good fit is obtained . for practical reasons the comparison is typically performed on 1d quantities such as the dependence of the quasi - elastic broadening as function of momentum transfer , temperature or coverage rather than direct comparison of the two dimensional dsf or isf functions@xcite .\nthe two numerical approaches we compare in this work are molecular dynamics ( md ) and 2d langevin simulations .\nthe first , provides an explicit treatment of the interactions between all the particles , whereas the second provides fast computation and a simple separation of the static and dynamic interactions characterizing the surface and has been heavily used to interpret quasi - elastic helium scattering experiments @xcite .\nmd simulations include the degrees of freedom of both the adsorbate and substrate atoms . in order to mimic experimental systems in a realistic way\n, complex many - body interactions can be used @xcite .\nhowever , the purpose of this work is to study how well langevin simulations can reproduce the explicit approach of md modeling .\nthis can be studied with particularly simple interactions for the md simulation ; pair - wise harmonic interaction between the substrate atoms and a morse potential between the adsorbate and the substrate atoms .\nthe parameters of these two interaction models and the mass of the particles were initially chosen to resemble an experimentally relevant situation - the motion of a sodium atom ( mass = 23 amu ) on a flat ( 001 ) copper surface .\nthe na / cu(001 ) system has been extensively studied experimentally , both in the regime of low coverage where the sodium atom can be assumed to move as an isolated adsorbate@xcite and at higher coverages where correlated motion effects dominate @xcite .\nfurthermore , this system represents a rare case where a md simulation has been used to interpret quasi - elastic helium scattering measurements , which conveniently supplies us with a set of parameters for both the harmonic and the morse potentials @xcite . for the harmonic term , @xmath24 , a single force constant between nearest neighbohrs of @xmath25\nwas shown to provide a good description of the copper bulk phonons @xcite .\nas mentioned above , the surface - adsorbate potential was modeled with a morse like potential @xmath26 where @xmath27 is the distance between the jth substrate atoms and the adsorbate and the sum runs over all the substrate atoms .\nthe following values were used for the parameters@xcite @xmath28 which reproduce the experimental measurements of the adsorption height and vibrational frequencies .\nthe geometry we used included a copper solid consisting of 7 layers of @xmath29 lattice cells with a total of 896 atoms .\nperiodic boundary conditions were imposed parallel to the surface . the bottom layer of the slab was frozen to simulate the rest of bulk layers and to fix the center of mass in its place .\nthe substrate atoms at @xmath6 were placed in their bulk equilibrium positions , and were given a random initial velocity using a maxwell - boltzmann distribution .\nthe atoms were then allowed to relax , after this relaxation period a single adsorbate atom was added on the surface and the system was allowed to relax again to the desired temperature .\nthe simulation was carried in the micro canonical ensemble .\nthe newtonian equations of motion were solved using beeman s algorithm @xcite for both substrate atoms and the single adsorbate .\na popular approach for interpreting quasi - elastic helium scattering experiments is using a 2d langevin simulation .\nwhen inter - adsorbate interactions can be ignored ( the zero coverage limit ) , the force is given by @xmath30 where @xmath31 is the adsorbate s 2d momentum .\nthe equation includes a constant potential energy surface ( pes ) term @xmath32 which is the potential the adsorbate experiences when the surface atoms are at their equilibrium points ( see sec .\n( [ sec : constructing - a - comparable ] ) for a description of the procedure used to obtain the pes ) .\nthe two terms which replace the explicit treatment of the dynamic interaction between the adsorbate and substrate atoms are a dissipation term @xmath33 which leads to energy losses and a random fluctuating force @xmath34 ( typically chosen as a white noise force ) which allows energy to be supplied to the adsorbate .\nthese two terms are not independent , as they are related through the fluctuation dissipation theorem@xcite @xmath35 it should be noted that if the issue of restricted computational time is ignored it would be preferable to extend the simulation to include a three dimensional motion of the adsorbate .\nhowever , since the height of an adsorbate above the surface is typically restricted to a very narrow range ( fractions of an angstrom ) and correspondingly the vibrational period perpendicular to the surface is about an order of magnitude faster than the motion of the adsorbate parallel to the surface , the effect of the vertical motion is typically assumed to be averaged out and a 2d langevin approach is used for analysis . as this is the most frequently used approach we chose to use it in our comparison .\nwhen fitting experimental data with a langevin simulation ( in the zero coverage limit ) , the free parameters are those used to define the pes and the friction parameter @xmath33 .\nthe parameters of the pes provide important insight into the average interaction between an adsorbate and the surface , they can be used to compare the energy of multiple adsorption sites @xcite and provide an important benchmark for density functional theory calculations @xcite .\nthe friction parameter reflects the atomic - scale energy transfer mechanism and plays an important role in a wide range of research fields and applications@xcite .\nsince measurements of atomic scale friction of isloated adsorbates are scarse , the ability to extract such values from langevin analysis of quasi - elastic helium scattering measurements is particularly important@xcite .\nthe current theoretical understanding of surface friction is rather limited , it is however custom to separate the frictional coupling into two main contributions , namely , electronic and phononic friction . within the langevin approach ,\nthe friction coupling is a fitting parameter and its value reflects the total friction regardless of its origin , this is in contrast with md simulations where the friction is not a parameter , rather it is a result of the explicit interactions between atoms .\nconsequently , since typically the interactions calculated in md simulations are between the ions , the only friction mechanism which is simulated is phononic friction and systems where electronic friction is important can not be accurately studied with simple md simulations of the type described above . as mentioned above , in a langevin simulation the energy transfer due to the substrate motion or other mechanisms\nis accounted for using a damping term and a fluctuating force term , both of which are determined by a single friction parameter , @xmath33 . in this work , the friction parameter is used as an adjustable fitting parameter .\nthe langevin simulation also requires the adiabatic interaction potential , i.e. the pes . when analyzing experimental data , the pes\nis derived using adjustable fitting parameters , however in this study , our goal is to perform a relevant comparison with a specific md model . in order to do this we chose the pes to be the time averaged potential in the md simulation .\nthe procedure for deriving the pes is the following : 1 .\ngenerate a 2d grid above the periodic unit of the substrate top layer .\n2 . for every point in the grid ,\nfix the lateral coordinates of the adsorbate , leaving all other degrees of freedom free .\nallow the system to relax to the equilibrium geometry . in the explicit md simulation , where @xmath36is the force acting on atom @xmath37 along its free coordinate @xmath19 in the system and @xmath38 is the velocity .\nif this product is positive then the atoms are moved according to the force , if it is negative the velocity is set to zero .\nthe quenching procedure described in the previous stage continues until the change in the system s total energy between time steps drops to a negligible level .\nthe forces acting on the atoms are the same forces used in the explicit md simulation described above .\nthe value of the pes at the grid point is set to be the potential energy of the entire system - contributions from the adsorbate - substrate interaction as well as interaction between the substrate s atoms .\nusing this procedure and the interaction parameters mentioned earlier ( for the harmonic and morse interactions ) the potential difference between a local minimum and saddle point of the pes was found to be 75 mev , this energy barrier corresponds to values in the range 2.9 - 6.2 for the temperatures this study was performed at ( from 300k down to 140k ) ] .\nthe same pes was used for all adsorbate masses in this work .\nboth of the simulations used in this work generate trajectories of the adsorbate . from each trajectory\nwe can construct the isf of the adsorbate using eq .\n( [ eq : isf self ] ) . figure ( [ fig : isf ] ) shows an example of the isf from a 10 nanosecond trajectory calculated by the langevin simulation .\nthe isf contains 2 main features with different characteristic time scales i ) a slow decay of the isf which takes place over tens of pico seconds and ii ) a rapid initial decay and an oscillatory pattern , which can be seen more clearly in the inset in figure ( [ fig : isf ] ) which depicts the isf at short times .\nboth of the features mentioned above are characteristic of surface diffusion systems and have been seen in experimental and theoretical work @xcite .\nexample of an isf function .\nthe inset shows the isf at short times , where a combination of decaying oscillations as well as a decaying exponential can be seen .\ntheir origin is explained in the text.,title=""fig:"",width=650 ] + example of a dsf function calculated from the isf shown in figure [ fig : isf ] , focusing on the two peaks which are located at the origin .\nthe much sharper qep dominates the dsf , whereas the underlying broad qeb can be seen more clearly in the inset plot.,width=650 ] the slow exponential decay is due to intercell diffusion i.e. transitions between local minimum in a corrugated potential as was discussed in section [ sec : basic - definitions - and ] .\nthe quasi - elastic broadening , ( or the decay rate of the isf ) , @xmath15 , and its dependence on @xmath39 and t can be related to the dynamics either using simple analytical theory ( e.g. equation [ eq : jump qhas-1 ] ) or as will be demonstrated below using more detailed numerical models . in the following section we will use @xmath40 to compare the diffusion process calculated by the two numerical simulations .\nthe oscillation and decay seen at short time scales , is related to the motion within the adsorption site .\nthe oscillation period reflects the vibrational motion of the adsorbate whereas the fast decay reflects the loss of phase coherency due to the random nature of this motion , a process sometimes referred to as intra - cell diffusion @xcite . in the dsf\nthis intra - cell motion appears as three peaks , two inelastic peaks located at the energy gain / loss values which correspond to vibrational frequency and one additional peak centered at @xmath14 , as shown in figure ( [ fig : dsf ] ) .\nthe width of all three peaks is related to the rapid decay due to the phase loss of the intra - cell motion mentioned above . since this decay is typically much faster than that due to the inter - cell motion , the widths of all three peaks are substantially larger than that of the qep ( i.e. the quasi - elastic broadening ) . in this work we will refer to the intra - cell motion contribution centered at @xmath14 as the quasi - elastic base ( qeb ) to differentiate it from the much sharper qep which is also centered at @xmath14 . as mentioned above , in many cases the time scale of the intercell diffusion is much slower compared to the intra - cell one , and the differentiation of the different contributions mentioned above is valid . in section [ sub : estimating - the - friction ] we will make use of this separation scheme in order to extract values for the frictional coupling within the adsorption site .\nas mentioned above , under many circumstances , including the conditions encountered in this work , inter - cell motion leads to an exponentially decaying isf equivalent to a lorentzian qep in the dsf . under these conditions , the quasi - elastic broadening @xmath15 , which can be extracted either from the decay rate of the isf or from the width of the qep peak in the dsf , can be used to characterize the inter - cell motion from both experiments and theory @xcite .\na method which allowed us to reliably extract , @xmath15 , from the calculated isf is to delay the fitting procedure to times which are sufficiently long to avoid mixing the contributions of the intra - cell motion mentioned above .. $ ] was fitted to a single exponential , with @xmath41 being advanced in time at each iteration until the decay times between successive iterations differed by less than 1% ] we start with the case of an adsorbate with a mass of 23 amu , representing the na / cu(001 ) system mentioned earlier .\nfigure [ fig : comparison optimal friction 23 amu ] shows a comparison of @xmath15 calculated using the two simulation approaches .\nthe left panel shows an example for calculations performed at 160k , the md results are shown using the black dot symbols , where as langevin results using different friction values in the range 0.44thz-0.72thz are plotted with coloured symbols according to the legend .\none immediate feature which can be seen for both simulations is the oscillatory nature of the width of the qep as function of the momentum transfer value .\nthis is a characteristic feature of jump diffusion as can be seen from the chudley elliot equation ( [ eq : jump qhas-1 ] ) .\na second observation which can be made is that the langevin simulation can reproduce the md result quite well if the friction parameter is set to a value of 0.56thz , can be expressed with the dimensionless quantity @xmath42 . ]\n, we will refer to the friction parameter which provides the best fit as the `` optimal friction value '' , @xmath43 .\nthis particular value is consistent with the results obtained in the past when analyzing experimental measurements of na / cu(001 ) with langevin simulations @xcite . for lower friction values\nwe observe a slower jump rate due to weak coupling between the substrate and adsorbate , while for higher friction values the shape of the curve is narrower , indicating the dominance of single jumps ( equation ( [ eq : jump qhas-1 ] ) reverts to a single oscillating term when only nearest neighbor jumps take place ) .\na ) quasi - elastic broadening , @xmath17 , calculated along the ( 1,1,0 ) crystal azimuth for a 23 amu adsorbate at 160k .\nmd results are shown with full black circles alongside with the results of langevin simulations with different friction parameters as indicated in the legend .\nb ) comparison between the @xmath17 calculated by the md ( black circles ) and those calculated by the langevin using the optimal friction values indicated in the legend alongside the relevant surface temperature .\n, width=650 ] the right panel of figure [ fig : comparison optimal friction 23 amu ] shows the same comparison in the temperature range 140k-300k .\nfor each temperature we plot the md calculation together with the langevin simulation obtained using the optimal friction values , @xmath44 , i.e. the friction values which gave the minimal standard deviation between the two @xmath17 curves .\nagain one can see that the langevin simulation can reproduce the md values quite well , however , the optimal friction parameters ( indicated in the legend ) are not identical for the different temperatures , instead there is a subtle but clear trend where the friction parameter , @xmath43 , increases with the temperature , i.e. the langevin simulation we used can not exactly reproduce the md results if a single temperature independent friction value is used . in the previous section we saw that we can find a good agreement between the quasi - elastic broadenings , @xmath40 , calculated by the two numerical models with only one free parameter , @xmath33 - the frictional coupling\nhowever , in order to optimize the fit we had to slightly adjust the friction according to the temperature . during the last two decades various systems have been measured using qhas , most of which were analyzed using langevin simulations where a single , temperature independent , friction coefficient was assumed @xcite .\nif the temperature dependence of the friction is significant for some of these systems , the analysis method which was applied in the past to extract an activation energy for these systems , resulted in a small but systematic error which needs to be taken into account . in order to study and understand this apparent temperature dependence we performed further calculations for heavier adsorbates , as this allows us to change the strength of the frictional coupling @xcite while leaving the inter - atomic forces unchanged .\nfigure [ fig : optimal friction vs temperature all masses ] shows the friction values which give the best quasi - elastic broadening match between the two simulations at different temperatures for 100 amu and 200 amu adsorbates .\nthe resolution of the friction parameter is @xmath45 and @xmath46 for the 100 and 200 amu masses respectively . from top to bottom :\nthe optimal friction as a function of temperature for 23 , 100 and 200 amu adsorbates respectively .\nresults for the `` straight '' ( over the bridge site ) crystal azimuth ( 110 ) are plotted in green and results for the `` diagonal '' ( over the top ) azimuth ( 100 ) in red .\na linear form was fitted for both crystal azimuths .\nnote that the relative change of the optimal friction parameter in the temperature range 150 - 300k can be as large as 100% for the heavier adsorbates we simulated .\n, width=377 ] two main observations can be made when comparing the results of the different masses : i ) the friction values needed to fit the two simulations are significantly reduced for heavier adsorbates .\nthis is the expected trend , since heavier adsorbates have a lower vibration frequency and are expected to have a weaker coupling to the substrate @xcite .\nii ) the need to adjust the friction parameter according to the temperature in order to get an agreement between the two simulations is more pronounced for the heavier adsorbates .\nthus , this temperature dependent friction which is rather subtle for the 23 amu adsorbate , and would have a small effect on the interpretation of experimental data , becomes a more significant effect for heavier adsorbates .\nwe have shown above , that in order to mimic the md results using a langevin simulation we need to allow the friction parameter to increase with temperature .\none explanation for this is that by changing the friction we simply make use of our only free parameter to compensate for the fact that the langevin simulation can not exactly mimic the md results , either due to the fundamental differences between the two , or due to our particular choice for the pes . on the other hand ,\nsince the friction is not an explicit parameter in the md simulation , another possibility is that the friction coupling changes with temperature in the md simulation and that the comparison with the langevin simulation is revealing this trend . in order to try and differentiate between these possibilities\nwe have attempted to extract an effective `` friction parameter '' from the md simulation and study its temperature dependence .\nwe achieve this by extracting the width of the quasi - elastic base ( qeb ) , as mentioned in section ( [ sec : basic - definitions - and ] ) .\nthe width of the qeb is governed by the dephasing rate of the intracell motion , i.e. it is related to the friction coupling within the adsorption site , a relation which has been shown , both analytically and numerically @xcite .\nin fact , if one looks at the lowest order of the analytically derived expression for the dsf , the half width of the qeb ( in the angular frequency domain ) is simply equal to @xmath33 @xcite .\nwhile the qeb width will undoubtedly be related to the frictional coupling , the accuracy of the simple relation between the two mentioned above is unknown . in particular ,\nthe analytical relation is valid within certain approximations @xcite .\nfurthermore , the friction in the langevin simulation reflects the average energy exchange rate , both within and outside the adsorption site , whereas the qeb is only related to the intracell motion within the adsorption site , hence the two properties are obviously not identical . in order to validate our approach , we first start by applying this method on the dsf calculated by langevin . since langevin simulations include an explicit friction value , our ability to reproduce this value from the qeb acts as a self consistency check for our method . in order to assist the fitting procedure and\nseparate any contributions from intercell diffusion , the dsf calculations were performed along the straight azimuth for @xmath47 , conditions under which the qep has a negligible width due to the jump diffusion process ( minima values of @xmath15 in eq .\n[ eq : jump qhas-1 ] ) .. , where @xmath48 is the frequency resolution of the calculated dsf .\nthis range was chosen to eliminate the qep contribution which manifests itself in the dsf as a single data point at @xmath14 .\nthe fitting range extended to a frequency which provided enough data points for the fit , yet avoided contribution from the lorentzian centered about the vibration frequency ] the fit for the langevin data is shown in figure ( [ fig : langevin qeb ] ) .\nat each temperature , the dsf corresponds to a calculation using the optimal friction value from figure ( [ fig : optimal friction vs temperature all masses ] ) .\nthe inset in figure ( [ fig : langevin qeb ] ) shows the friction values extracted from the qeb width ( blue circles ) versus the friction parameters used in the simulation ( denoted @xmath49 .\noverall , the two values are very close , with the qeb underestimating the friction parameters by 15%-20% , similar calculation for higher masses ( 100 amu and 200 amu , data not shown ) produce even smaller deviations between the two .\nthus , we conclude that the qeb width provides a reasonable way to estimate the friction within the accuracy stated above .\nfitting the qeb peak calculated by the langevin simulations for the different temperatures .\nthe full symbols are the dsf values calculated by the langevin simulations , performed using @xmath43 friction parameters .\nthe solid lines show the lorentzian fit described in the text which was used to extract the qeb width .\nthe inset compares @xmath43 with the friction values extracted from the width of the lorentzian peak which best fitted the qeb@xcite . , width=650 ] next , we applied the same procedure on the md data in order to extract effective friction values and check how they change with temperature .\nfigure [ fig : qeb md ] shows the qeb peaks calculated for the different temperatures and different adsorbate masses , and a lorentzian peak fit ( full lines ) to the qeb .\nfirst we note , that the lorentzian fit to the qeb peak is not quite as good as it was for the spectra calculated by the langevin simulation , mostly due to low frequency peaks related to the surface vibrations and an incomplete subtraction of the qep peak ( assumed to be a delta function at the diffraction condition ) .\nnevertheless , we see that the values extracted from the lorentzian width are quite close to the langevin friction values which were obtained by fitting the @xmath17 curves ( @xmath43 ) .\nthe inset depicts the comparison between @xmath43 and qeb widths ( extracted from the md simulation ) for the different temperatures and adosrbate masses .\nan obvious feature which can be seen from these graphs is that for all three masses the qeb widths extracted from the md calculations increase as function of temperature , more or less following the trend of @xmath43 .\nconsequently , we conclude that the need to increase @xmath43 as function of t when trying to reproduce the md results with the langevin simulation , represents a property of the frictional coupling of the md simulation , which is then revealed in the comparison with the langevin simulations . in other words\n, the fact we had to increase @xmath43 with temperature in order to mimic the md results with the langevin code , does not indicate a discrepancy between the two simulations . from top to bottom , results of the @xmath50 fit to a single lorentzian for 23 , 100 and 200 amu adsorbate .\nthe inset shows the qeb widths extracted from md data simulations at different temperatures alongside the optimal friction values , @xmath43 , used by the langevin simulation to fit the md @xmath17 curves .,title=""fig:"",width=377 ] +\nwe have compared two different numerical approaches for interpreting adsorbate diffusion on a solid substrate , namely , md and langevin simulations .\na major difference between these two approaches is the substitution of the dynamic substrate which is explicitly simulated by the md code , with a friction damping term and a stochastic force in the langevin simulation . since this substitution can not accurately account for correlations between the relative motion of the substrate and adsorbate atoms which takes place in the md , a certain discrepancy in the simulated dynamics is anticipated .\nfor example , a substrate phonon creates a time dependent distortion of the potential energy surface on which the adsorbate moves , hence one could expect that the rate of single jumps and longer jumps would be affected by the frequency and amplitude of the substrate vibrations . while it is obvious that such correlations will take place to some degree , a quantitative assessment of the discrepancies was missing in the literature , and it was unclear whether they are sufficiently large to affect the interpretation of realistic ( noisy ) experimental data .\nthe observables we chose to compare are the isf and dsf correlation functions , focusing on the width of the quasi - elastic peak , @xmath15 , in particular .\nthe dependence of the quasi - elastic peak width on the momentum transfer and sample temperature provides a sensitive measure of the motion rate and mechanism , and is also accessible to helium scattering experiments@xcite .\nthe comparison we performed showed that for the particular systems we simulated , the two simulations can produce very similar observables , using the friction parameter as the only free parameter used to fit the two .\nthus , within the conditions we simulated , correlation effects do not seem to lead to any noticeable discrepancies between the two simulations , and the langevin simulations provides a good approach for simulating the surface dynsmics .\nwe did notice that the optimal friction values which we obtained from fitting the two simulations increased slightly with temperature , an effect which was more significant for adsorbates with a higher mass .\none possible interpretation of this observation is that the need to increase the friction parameter of the langevin simulation at higher temperatures is an indication of a discrepancy between the two numerical approaches ( i.e. we are compensating for fundamental differences between the two simulations by adjusting the fit parameter ) .\nanother interpretation is that the frictional coupling rate is increasing with temperature in the md simulation and that the comparison with the langevin simulation ( which produces an optimal friction parameter ) is simply revealing this fact .\nwe used the quasi - elastic base width as a method to estimate the frictional coupling from the dephasing rate of the motion within the adsorption site and extract effective friction values from the md data .\nour results show that the effective friction values extracted using this method , are in close proximity to those used to fit the langevin simulation .\nfurthermore , the effective friction values also show an increase with temperature supporting the second interpretation mentioned above , i.e the frictional coupling increases with temperature in the md simulation and the need to adjust the friction parameter of the langevin simulation to fit the two simulations does not indicate a discrepancy between the two numerical approaches .\nin conclusion , when using the particular interaction models mentioned above , adsorbate masses ranging from 23 to 200 amu , and temperatures within the range of 140k to 300k , we do not observe significant differences between the langevin and md simulations .\nthus , even if differences exist , they are subtle and should not affect the analysis of experimental data with similar or larger noise levels\n. an explanation for this lack of discrepancy , might be that the relatively fast time scales which characterize the substrate motion lead to an averaging effect which reduces the importance of explicit correlations and allows us to treat the interaction as a sum of a static interaction ( pes ) and a stochastic force with a good accuracy .\nit is also worth noting , that in the past when applying langevin simulations for data analysis , it was assumed that the friction is independent of surface temperature .\nwhile the particular trend we observed in the md simulation reflects our choice of model for simulating the substrate ( harmonic potential ) and adsorbate ( morse potential ) and is not directly related to other systems and interaction models , it is worth remembering that the friction might change as function of temperature also in other systems .\nif this temperature dependence is not negligible and is not taken into account , systematic errors might be produced when extracting physical properties from the simulations , in particular the energy barrier for diffusion deduced from arrhenius graphs .\nfinally , we assume that there will be other systems and conditions under which correlations will produce noticeable effects , however , these will probably require substantially different time scales ( faster adsorbate motion or slower substrate motions ) and it should be interesting to study such systems in the future .\nthe authors would like to thank prof .\nerio tossati for valuable scientific discussions .\nthis work was supported by the israeli science foundation ( grant no .\n2011185 ) and the european research council under the european unions seventh framework program ( fp/2007- 2013)/ erc grant 307267 .\npeter fouquet , mark r. johnson , holly hedgeland , andrew p. jardine , john ellis , and william allison .\nmolecular dynamics simulations of the diffusion of benzene sub - monolayer films on graphite basal plane surfaces . , 47(11):26272639 , 2009 .\na. p. jardine , e. y. m. lee , d. j. ward , g. alexandrowicz , h. hedgeland , w. allison , j. ellis , and e. pollak .\ndetermination of the quantum contribution to the activated motion of hydrogen on a metal surface : h / pt(111 ) .\n, 105(13):136101 , 2010 .\na. p. graham , f. hofmann , j. p. toennies , l. y. chen , and s. c. ying .\nexperimental and theoretical investigation of the microscopic vibrational and diffusional dynamics of sodium atoms on a cu(001 ) surface . , 56(16):10567 , october 1997 .\ngil alexandrowicz , pepijn r. kole , everett y. m. lee , holly hedgeland , riccardo ferrando , andrew p. jardine , william allison , and john ellis .\nobservation of uncorrelated microscopic motion in a strongly interacting adsorbate system .\n, 130(21):67896794 , may 2008 .\na p jardine , h hedgeland , d ward , y xiaoqing , w allison , j ellis , and g alexandrowicz . probing molecule surface interactions through ultra - fast adsorbate\ndynamics : propane / pt(111 ) . , 10(12):125026 , 2008 .","diffusion studies of adsorbates moving on a surface are often analyzed using 2d langevin simulations . \n these simulations are computationally cheap and offer valuable insight into the dynamics , however , they simplify the complex interactions between the substrate and adsorbate atoms , neglecting correlations in the motion of the two species . \n the effect of this simplification on the accuracy of observables extracted using langevin simulations was previously unquantified . \n here we report a numerical study aimed at assessing the validity of this approach . \n we compared experimentally accessible observables which were calculated using a langevin simulation with those obtained from explicit molecular dynamics simulations . \n our results show that within the range of parameters we explored langevin simulations provide a good alternative for calculating the diffusion procress , i.e. the effect of correlations is too small to be observed within the numerical accuracy of this study and most likely would not have a significant effect on the interpretation of experimental data . \n our comparison of the two numerical approaches also demonstrates the effect temperature dependent friction has on the calculated observables , illustrating the importance of accounting for such a temperature dependence when interpreting experimental data .",introduction and motivation\n[sec:basic-definitions-and]basic definitions and methods for spectra interpretation.\nnumerical models and interpretation methods\ncomparison of md and langevin quasi-elastic broadenings\nsummary and conclusions\nacknowledgements
3,"the energy concentration problem in the time - frequency domain plays a crucial role in signal processing .\nthe foundation of this problem comes from 1960s the research group of bell labs @xcite .\nthe problem states that for any given signal @xmath0 with its fourier transform ( ft ) @xmath1 the energy ratios of the duration and bandwidth limiting of the signal @xmath0 , i.e. , @xmath2 and @xmath3 of @xmath4 both in fixed time @xmath5 $ ] and frequency @xmath6 $ ] domains , satisfy the following inequality @xmath7 let @xmath8 be the total energy of @xmath0 . by the parseval theorem @xcite , the energy in time and frequency domains are equal , i.e. , @xmath9 . without loss of generality\n, we consider the unit energy signals throughout this paper , i.e. , @xmath10 . the important constant @xmath11 in eq . ( [ eq.bound ] ) is the eigenvalue of the zero order prolate spheroidal wave functions ( pswfs ) .\nthe pswfs are originally used to solve the helmhotz equation in prolate spheroidal coordinates by means of separation of variables @xcite . in 1960s , slepian _ et al .\n_ @xcite found that pswfs are solutions for the energy concentration problem of bandlimited signals @xcite .\ntheir real - valued pswfs are solutions of the integral equation @xmath12 where @xmath13 are eigenvalues of pswfs . here\n$ ] and @xmath6 $ ] are the fixed time and frequency domains , respectively .\nimportant properties of pswfs are given in @xcite .\nthe following properties follow form the general theory of integral equations and are stated without proof .\n1 . eq.([eq.1dpswfs ] ) has solutions only for real , positive values eigenvalues @xmath14 .\nthese values is a monotonically decreasing sequence , @xmath15 , such that @xmath16 .\n2 . to each @xmath14 there corresponds only one eigenfunction @xmath17 with a constant factor .\nthe functions @xmath18 form a real orthonormal set in @xmath19;\r)$ ] .\nan arbitrary real @xmath20-bandlimited function @xmath21 can be written as a sum @xmath22 where @xmath23 .\nthese properties are useful in solving the energy concentration problem and other applications @xcite .\n@xcite naturally extended them to higher dimension and discussed their approximation in some special case in the following years .\nafter that , the works on this functions are slowly developed until 1980s a large number of engineering applied this functions to signal processing , such as bandlimited signals extrapolation , filter designing , reconstruction and so on @xcite .\nthe pswfs have received intensive attention in recent years .\nthere are many efforts to extend this kind of functions to various types of integral transformations .\n_ @xcite generalized pswfs associated with the finite fractional fourier transform ( frft ) and applied to the sampling theory .\nzayed _ et al .\n_ @xcite generalized pswfs not only associated with the finite frft but also associated with the linear canonical transforms ( lcts ) and applied to sampling theory .\n@xcite discussed the pswfs associated with lcts in detail and presented the maximally concentrated sequence in both time and lcts - frequency domains .\nthe wavelets based pswfs constructed by walter\n_ et al . _\n@xcite have some desirable properties lacking in other wavelet systems .\n@xcite developed the pswfs with noncommutative structures in clifford algebra .\nthey not only generalized the pswfs in clifford space ( cpswfs ) , but also extended the transform to clifford lct .\nbut they just gave some basic properties of this functions and have not discussed details of the energy relationship for square integrable signals . in this paper , we consider the energy concentration problem for hypercomplex signals , especially for quaternionic signals @xcite associated with quaternionic lcts ( qlcts ) in detail . the improvement definition of qpswfs are considered for odd and even quaternionic signals .\nthe study is a great improvement on the one appeared in @xcite .\nthe qlct is a generalization of the quaternionic ft ( qft ) and quaternionic frft ( qfrft ) .\nthe qft and qfrft are widely used for color image processing and signal analysis in these years @xcite .\ntherefore , it has more degrees of freedom than qft and qfrft , the performance will be more advanced in color image processing . in the present paper ,\nwe generalize the 1d pswfs under the qlcts to the quaternion space , which are referred to as quaternionic pswfs ( qpswfs ) . the improved definition of qpswfs associated with the qlcts is studied and their some important properties are analyzed . in order to find the relationship of @xmath24 for any square integrable quaternionic signal ,\nwe show that the parseval theorem and studied the energy concentration problem associated with the qlcts . in\nparticularly , we utilize the quaternion - valued functions multiply two special chirp signals on both sides as a bridge between the qlcts and the qfts\n. the main goal of the present study is to develop the energy concentration problem associated with qlcts .\nwe find that the proposed qpswfs are the most energy concentrated quaternionic signals .\nthe body of the present paper is organized as follows . in section [ sec.quaternionalgebra ] and [ sec.qlct ] , some basic facts of quaternionic algebra and qlcts\nare given .\nmoreover , the parseval identity for quaternionic signals associated with the ( two - sided ) qlcts are presented . in section [ s4 ] , the improved definition and some properties of qpswfs associated with qlcts\nare discussed .\nthe section [ s5 ] presents the main results , it includes two parts . in subsection\n[ s5.1 ] , we introduce the existence theorem for the maximum energy concentrated bandlimited function on a fixed spatial domain associated with the qlcts . in subsection\n[ s5.2 ] , we discuss the energy extremal properties in fixed spatial and qlcts - frequency domains for any quaternionic signal . in particular , we give an inequality to present the relationship of energy ratios for any quaternionic signal , which is analogue to the high dimensional real signals .\nmoreover , examples of energy concentrated ratios between the truncated gaussian function and qpswfs are presented , which can intuitively illustrate that qpswfs are the more energy concentrated signals .\nfinally , some conclusion are drawn in section [ s6 ] .\nthe present section collects some basic facts about quaternions @xcite , which will be needed throughout the paper . for\nall what follows , let @xmath25 be the _\nhamiltonian skew field of quaternions _ : @xmath26 which is an associative non - commutative four - dimensional algebra .\nthe basis elements @xmath27 obey the hamilton s multiplication rules : @xmath28 and the usual component - wise defined addition . in this way\nthe quaternionic algebra arises as a natural extension of the complex field @xmath29 .\nthe _ quaternion conjugate _ of a quaternion @xmath30 is defined by @xmath31 we write @xmath32 and @xmath33 which are the _ scalar _ and _ vector parts _ of @xmath30 , respectively .\nthis leads to a norm of @xmath34 defined by @xmath35 then we have @xmath36 , @xmath37 , @xmath38 , for any @xmath39 . by ( [ eq.4partshnumber ] ) , a quaternion - valued function or , briefly ,\nan @xmath25-valued function @xmath40 can be expressed in the following form : @xmath41 where @xmath42 @xmath43 . for convenience s sake , in the considerations to follow we will rewrite @xmath44 in the following symmetric form @xcite : @xmath45 properties ( like integrability , continuity or differentiability ) that are ascribed to @xmath44 have to be fulfilled by all components @xmath46 @xmath47 . in order to state our results , we shall need some further notations .\nthe linear spaces @xmath48 ( @xmath49 ) consist of all _ @xmath25-valued functions _ in @xmath50 under left multiplication by quaternions , whose @xmath51-th power is lebesgue integrable in @xmath50 : @xmath52 in this work , the _ left _ quaternionic inner product of @xmath53 is defined by @xmath54 the reader should note that the norm induced by the inner product ( [ eq.hinnerproduct ] ) , @xmath55 coincides with the @xmath56-norm for @xmath44 , considered as a vector - valued function .\nthe angle between two non - zero functions @xmath57 is defined by @xmath58 the superimposed argument is well - defined since , obviously , it holds @xmath59\nthe lct was first proposed by moshinsky and collins @xcite in the 1970s .\nit is a linear integral transform , which includes many special cases , such as the fourier transform ( ft ) , the frft , the fresnel transform , the lorentz transform and scaling operations . in a way , the lct has more degrees of freedom and is more flexible than the ft and the frft , but with similar computational costs as the conventional ft . due to the mentioned advantages , it is of natural interest to extend the lct to a quaternionic algebra framework .\nthese extensions lead to the _ quaternionic linear canonical transforms _ ( qlcts ) . due to the non - commutative property of multiplication of quaternions , there are different types of qlcts . as explained in more detail below , we restrict our attention to the _ two - sided _ qlcts @xcite of 2d quaternionic signals in this paper .\n[ * two - sided qlcts * ] [ def : qlcts ] let @xmath60 be a matrix parameter such that @xmath61 for @xmath62 the two - sided qlcts of signals @xmath63 are given by @xmath64 where the kernel functions are formulated by @xmath65 \sqrt{d_1 } e^{\i ( { c_1 d_1 \over 2 } ) u^2 } , & { \rm for } \;\ ,\nb_1 = 0,\end{array}\right.\end{aligned}\ ] ] and @xmath66 \sqrt{d_2 } e^{\j ( { c_2 d_2 \over 2 } ) v^2 } , & { \rm for } \;\ , b_2 = 0.\end{array}\right.\end{aligned}\ ] ] it is significant to note that when @xmath67 , the qlct of @xmath44 reduces to @xmath68 , where @xmath69is the two - sided qft of @xmath44 .\nnote that when @xmath70 @xmath71 , the qlct of a signal is essentially a chirp multiplication and is of no particular interest for our objective interests .\nwithout loss of generality , we set @xmath72 @xmath71 throughout the paper .\nlet @xmath73 . using the euler formula for the quaternionic linear canonical kernel we can rewrite eq .\n( [ eq.2sideqlcts ] ) in the following form : @xmath74 where @xmath75 the above equation clearly shows how the qlcts separate * real * signals @xmath76 into four quaternionic components , i.e. , the even - even , odd - even , even - odd and odd - odd components of @xmath76 . from eq .\n( [ eq.2sideqlcts ] ) if @xmath77 , then the two - sided qlcts @xmath78 has a symmetric representation @xmath79 where @xmath80 @xmath47 are the qlcts of @xmath46 and they are @xmath25-valued functions . under suitable conditions , the inversion of two - sided quaternionic linear canonical transforms of @xmath81 can be defined as follows .\n[ * inversion qlcts * ] suppose that @xmath82\n. then the inversion of two - sided qlcts of @xmath81 are defined by @xmath83 where @xmath84 and @xmath85 for @xmath86 .\nthe following subsection describes the important relationship between qlcts and qft , which will be used to establish the main results in section [ s5 ] .\nnote that the qlcts of @xmath44 multiple the chirp signals @xmath87 on the left and @xmath88 on the right can be regarded as the qft on the scale domain . since @xmath89 where @xmath90 is related to the parameter matrix @xmath91 in eq .\n( [ eq.2sideqlcts ] ) .\n[ * relation between qlct and qft * ] let @xmath60 be a real matrix parameter such that @xmath92 for @xmath62 the relationship between two - sided qlcts and qfts of @xmath93 are given by @xmath94 where @xmath95 .\nthis subsection describes energy theorem of two - sided qlcts @xcite , which will be applied to derive the extremal properties of qlcts in section [ s5 ] .\n[ * energy theorem of the qlcts * ] [ th.planchereltheorem ] any 2d @xmath25-valued function @xmath96 and its qlct @xmath97 are related by the parseval identity @xmath98 for @xmath96 , direct computation shows that @xmath99.\end{aligned}\ ] ] applying the definition of qlcts , we have @xmath100\\ & = & { \bf sc}\left[\int_{\r^4}k^{\i}_{a_1}(x , u)\bm{f}(x , y ) k^{\j}_{a_2}(y , v)\overline{\mathcal{l}(\bm{f})(u , v)}dxdydudv \right]\\ & = & \int_{\r^4}{\bf sc}\left[k^{\i}_{a_1}(x , u)\bm{f}(x , y ) k^{\j}_{a_2}(y , v)\overline{\mathcal{l}(\bm{f})(u , v)}\right]dxdydudv.\end{aligned}\ ] ] with @xmath101 for any @xmath102 and @xmath103 , @xmath104 , we have @xmath105 dxdydudv \\ & = & \int_{\r^4}{\bf sc}\left[\bm{f}(x , y ) \overline{k^{\i}_{a^{-1}_1}(x , u)\mathcal{l}(\bm{f})(u , v ) k^{\j}_{a^{-1}_2}(y , v ) } \right ] dxdydudv\\ & = & { \bf sc}\left [ \int_{\r^2}\bm{f}(x , y)\overline{\int_{\r^2 } k^{\i}_{a^{-1}_1}(x , u)\mathcal{l}(\bm{f})(u , v ) k^{\j}_{a^{-1}_2}(y , v)dudv } dxdy\right]\\ & = & { \bf sc}\left [ \int_{\r^2}\bm{f}(x , y)\overline{\bm{f}(x , y)}dxdy\right ] = \int_{\r^2}\bm{f}(x , y)\overline{\bm{f}(x , y)}dxdy=\| \bm{f}\|^2.\end{aligned}\ ] ] hence this completes the proof .\ntheorem [ th.planchereltheorem ] shows that the energy for an @xmath25-valued signal in the spatial domain equals to the energy in the qlcts - frequency domain .\nthe parseval theorem allows the energy of an @xmath25-valued signal to be considered on either the spatial domain or the qlcts - frequency domain , and exchange the domains for convenience computation .\nthe energy theorem of @xmath44 and @xmath95 associated with their qft is given by @xmath106\nin the following , we first explicitly present the definition of pswfs associated with qlcts . consider the 1d pswfs @xcite ,\nlet us extend the pswfs to the quaternionic space associated with qlcts .\n[ * qpswfs * ] the solutions of the following integral equation in @xmath107 @xmath108 are called the quaternionic prolate spheroidal wave functions ( qpswfs ) @xmath109 associated with qlcts . here , the complex valued @xmath110 are the eigenvalues corresponding to the eigenfunctions @xmath111 . the real parameter matrix @xmath112 with @xmath113 , @xmath114 , for @xmath115 .\nthe real constant @xmath116 is a ratio about the frequency domain @xmath117\times[-\sigma,\sigma]$ ] and the spatial domain @xmath118\times[-\tau,\tau]$ ] , where @xmath119 .\n( [ eq.qpswf ] ) is named the _\nfinite qlcts form of qpswfs_. note that for simplicity of presentation , we write @xmath120 and @xmath121 . the solutions of this integral equation in eq .\n( [ eq.qpswf ] ) are well established in some special cases .\n( i ) : : in the square region @xmath122\times[-\tau,\tau]$ ] , if qlcts are degenerated to 2d fourier transform ( ft ) , then qpswfs becomes the 2d real pswfs , which is given by @xmath123 here , if @xmath124 is separable , i.e. , @xmath125 , then the 2d pswfs can be regarded as the product of two 1d pswfs . to aid the reader ,\nsee @xcite for more complete accounts of this subject .\n( ii ) : : in a unit disk , the qlcts are degenerated to 2d ft , then the qpswfs between the circular pswfs @xcite @xmath126 we call the right - hand side of eq .\n( [ eq.qpswf ] ) is the finite qlcts .\nhowever , @xmath127 only for the @xmath128 .\nthere is a scale factor @xmath116 added to the parameter matrix , which is different from the definition of qlcts\n. some important properties of qpswfs will be considered in this part , which are crucial in solving the energy concentration problem .\nlet @xmath129 be the qpswfs associated with their qlcts and @xmath130 .\nthen @xmath131 are solutions of the following integral equation @xmath132 where @xmath133 for @xmath134 are the eigenvalues corresponding to @xmath135 and @xmath113 , @xmath114 , for @xmath115 , and @xmath136 .\n( [ eq.lowpass ] ) is named the _ low - pass filtering form of qpswfs associated with qlcts_. we shall show that eq .\n( [ eq.lowpass ] ) is derived by the eq .\n( [ eq.qpswf ] ) .\nstraightforward computations of the right - hand side of eq .\n( [ eq.lowpass ] ) show that @xmath137 applying the following two important equations @xcite to the last integral , @xmath138 then we have @xmath139e^{\mathbf{j}v_2v}dv_1dv_2.\end{aligned}\ ] ] combining eq.([eq.qpswf ] ) with the parameter matrices @xmath140 , and @xmath141 , we have @xmath142e^{\mathbf{j}v_2v}dv_1dv_2\\ & = & \frac{1}{2\pi}\int_{\bm{\sigma}}e^{\mathbf{i}v_1u } \left [ \lambda_n e^{\mathbf{-i}\frac{cd_1}{2b_1}(\frac{b_1v_1}{c})^2 } \mathbf{i}^{n } \sqrt{cb_1 } \bm{\psi}_{n}\big(\frac{b_1v_1}{cb_1},\frac{b_2v_2}{cb_2}\big ) \sqrt{cb_2 } \mathbf{j}^{n } e^{\mathbf{-j}\frac{cd_2}{2b_2}(\frac{b_2v_2}{c})^2 } \right ] e^{\mathbf{j}v_2v}dv_1dv_2\\ & = & \frac{1}{2\pi } \lambda_n \int_{\bm{\sigma}}e^{\mathbf{i}v_1u } \left[e^{\mathbf{-i}\frac{cd_1b_1}{2}(\frac{v_1}{c})^2 } \mathbf{i}^{n } \sqrt{cb_1 } \bm{\psi}_{n}\big(\frac{v_1}{c},\frac{v_2}{c}\big ) \sqrt{cb_2 } \mathbf{j}^{n } e^{\mathbf{-j } \frac{cd_2b_2}{2}(\frac{v_2}{c})^2 } \right ] e^{\mathbf{j}v_2v}dv_1dv_2\\ & = & \frac{1}{2\pi } \lambda_n c^3\sqrt{b_1b_2}\mathbf{i}^{n } \left[\int_{\bm{\tau } } e^{\mathbf{i}w_1cu } e^{\mathbf{-i}\frac{cd_1b_1}{2}w_1 ^ 2 } \bm{\psi}_{n}(w_1,w_2 ) e^{\mathbf{-j}\frac{cd_2b_2}{2}w_2 ^ 2 } e^{\mathbf{j}w_2cv}dw_1dw_2 \right ] \mathbf{j}^{n}\\ & = & \lambda_n c^3\sqrt{b_1b_2}\mathbf{i}^{n } \left [ \lambda_n e^{-\mathbf{i}\frac{c(\frac{-a_1}{{b_1}^2})}{2b_1}(ub_1)^2 } \frac{(-\mathbf{i})^n}{\sqrt{\mathbf{i } } } \sqrt{cb_1\mathbf{i } } \bm{\psi}_{n}\big(\frac{ub_1}{b_1},\frac{vb_2}{b_2}\big ) \sqrt{cb_2\mathbf{j } } \frac{(-\mathbf{j})^n}{\sqrt{\mathbf{j } } } e^{-\mathbf{j}\frac{c(\frac{-a_2}{{b_2}^2})}{2b_2}(vb_2)^2 } \right ] \mathbf{j}^{n}\\ & = & \lambda_n^2 c^4b_1b_2e^{\mathbf{i}\frac{ca_1}{2b_1}u^2 } \bm{\psi}_{n}(u , v)e^{\mathbf{j}\frac{ca_2}{2b_2}v^2 } = c^4b_1b_2\lambda_n^2 \tilde{\bm{\psi}}_{n}(u , v)=:\mu_n \tilde{\bm{\psi}}_{n}(u , v).\end{aligned}\ ] ] the proof is complete . for the specific parameters @xmath143 , eq .\n( [ eq.lowpass ] ) becomes the low - pass form of qpswfs associated with qft @xmath144 to obtain the following property , we shall show a special convolution theorem of any @xmath25-valued signal and real - valued signal .\n[ lem.convolution ] let @xmath145 and @xmath146 associated with their qft @xmath147 and @xmath148 with @xmath149 , where @xmath150 , @xmath151 and @xmath152 .\nthe convolution of @xmath44 and @xmath153 is defined as @xmath154 then the qft for @xmath155 holds @xmath156 let @xmath157 and @xmath158 , straightforward computation the qft in eq.([eq.2sideqft ] ) of the convolution between @xmath0 and @xmath153 shows that @xmath159 g(m , n ) e^{-\mathbf{j}yv}e^{-\mathbf{j}nv}dxdydmdn.\end{aligned}\ ] ] with @xmath160 , the last integral becomes @xmath161 since we have known @xmath148 is real - valued , then we have @xmath162 this completes the proof .\nnote that if the real signal @xmath163 with @xmath148 is real valued , then @xmath164 .\nit means that @xmath165 the convolution theorems for quaternion fourier transform was given in @xcite .\nlemma [ lem.convolution ] is following the idea of theorem 13 and lemma 14 in @xcite . for completeness\n, we proof the convolution formula in eq .\n( [ eq.convolution ] ) .\nlet @xmath129 be the qpswfs associated with qlcts and @xmath166 , @xmath131 satisfies @xmath167 which extends the integral of @xmath135 from @xmath168 to @xmath50 .\nlet @xmath169 eq .\n( [ eq.lowpass ] ) is actually a convolution of @xmath170 with two - dimensional _ sinc _ kernel @xmath171 as follows @xmath172 denote @xmath173 . from lemma [ lem.convolution ] , let @xmath174 , the @xmath163 and its qft @xmath175 is real valued function .\nthen taking qft to the both sides of eq .\n( [ eq.convl1 ] ) , we have @xmath176 immediately , we obtain that @xmath177 for @xmath178 , i.e. , @xmath179 here @xmath180 from lemma [ lem.convolution ] , taking the inverse qft on both sides of the above equation , it follows that @xmath181 satisfies eq .\n( [ eq.allpass ] ) , which extends the integral domain of @xmath135 from @xmath168 to @xmath50 .\nthe propositions [ pro.eigenvalues ] and [ pro.orthogonalintau ] follow from the general theory of integral equations of hermitian kernel and are stated without proof @xcite .\n[ pro.eigenvalues ] eq .\n( [ eq.lowpass ] ) has solutions for real or complex @xmath182 .\nthese values are a monotonically decreasing sequence , @xmath183 , and satisfy @xmath184 . [ pro.orthogonalintau ] for different eigenvalues @xmath185 , the corresponding eigenfunctions @xmath186 are an orthonormal set in @xmath168 , i.e. , @xmath187 the eigenfunctions @xmath188 form an orthonormal system in @xmath50 , i.e. , @xmath189 combining eq .\n( [ eq.lowpass ] ) , the orthogonality in @xmath50 can be immediately deduced as follows @xmath190\nin the present section , we will consider the energy concentration problem of bandlimited @xmath25-valued signals in fixed spatial and qlcts - frequency domains .\nthe definitions and notations of bandlimited @xmath25-valued signals associated with qlcts and qft are introduced in the following .\nan @xmath25-valued signal @xmath191 with finite energy is @xmath192-bandlimited associated with qlcts , if its qlcts vanishes for all @xmath193 outside the region @xmath192 , i.e. , @xmath194 denote @xmath195 the set of @xmath192-bandlimited @xmath25-valued signals associated with qlcts , i.e. , @xmath196 an @xmath25-valued signal @xmath191 with finite energy is @xmath192-bandlimited associated with qft , if its qft vanishes for all @xmath193 outside the region @xmath192 , i.e. , @xmath197 denote @xmath198 the set of the @xmath192-bandlimited @xmath25-valued signals associated with qft , i.e. , @xmath199 note that the relationship between qlct and qft for an @xmath25-valued signal @xmath44 @xmath200 that is to say if @xmath201 , then for the @xmath202 , @xmath193 is also in @xmath192 , that means @xmath203\times [ \frac{-\sigma}{b_2},\frac{\sigma}{b_2}]= : \tilde{\bm{\sigma}}. \end{aligned}\ ] ] then @xmath204 , because @xmath205 now we pay attention to the energy concentration problem associated with qlcts . to be specific\n, the energy concentration problem associated with qlcts aims to obtain the relationship of the following two energy ratios for any @xmath25-valued signal @xmath44 with finite energy in a fixed spatial and qlcts - frequency domains , i.e. , @xmath168 and @xmath192 , @xmath206 by the parseval identity in eq .\n( [ eq.parsevalqft ] ) , the two ratios can also be obtained by @xmath207 note that the value of @xmath208 and @xmath209 are real values in @xmath210 $ ] . in this part\n, we only consider the energy problem for @xmath211 , i.e. , @xmath212 and @xmath204 .\nconcretely speaking , given an unit energy @xmath204 , the energy concentration problem is finding the maximum of @xmath208 , i.e. , @xmath213 denote the maximum @xmath208 as follows @xmath214 let @xmath215 , we can also reformulate @xmath208 as follows @xmath216 we conclude that the maximum @xmath217 can be taken if @xmath218 . to derive this fact , the generally cross - correlation function @xmath219 of @xmath0 and @xmath220 was introduced at first @xcite , @xmath221 consider the @xmath222\n, we have @xmath223 from the complex - valued schwarz s inequality , @xmath224 the @xmath225 takes the maximum value if @xmath226 , where @xmath227 is a constant .\nsimilarly , we can define the cross - correlation function @xmath228 of @xmath25-valued signals @xmath44 and @xmath229 as follows @xmath230 since the quaternionic schwarz s inequality also holds .\nthen to get the maximum value of @xmath231 , the relationship between @xmath44 and @xmath232 satisfies @xmath233 , where @xmath234 is a constant . here\n, we find that @xmath235 to achieve the maximum @xmath208 , the two functions should be the same except a constant factor .\nfor this reason , there exists a constant @xmath236 such that @xmath218 .\nlet @xmath237 and @xmath238 are the qft for @xmath239 and @xmath240 , respectively .\ntaking qft to both sides of the equation @xmath218 , we have @xmath241 since @xmath242 , then @xmath243 is also in @xmath244 , i.e. , @xmath245 from lemma [ lem.convolution ] , taking the inverse qft to the above equation , we have @xmath246 substituting @xmath247 to the above equation , we have @xmath248 which is the low - pass filter form of qpswfs .\nnow we show that @xmath192-bandlimited @xmath25-valued signals satisfying the low - pass filter form eq .\n( [ eq.low ] ) can reach the maximum @xmath217 .\n[ th.theoremexist ] if the eigenvalues of the integral equation @xmath249 have a maximum @xmath236 , then @xmath250 .\nthe eigenfunction corresponding to @xmath251 is the function such that @xmath217 are reached . for any @xmath192-bandlimited signal @xmath240 ,\nconstruct a function @xmath252 as follows @xmath253 let the qft of @xmath252 as @xmath254 , it follows that @xmath255 it means that @xmath256 .\ndenote the energy ratio @xmath257 for @xmath252 in the fixed spatial domain @xmath168 as follows @xmath258 we conclude that for any @xmath259 , @xmath208 can not exceed the @xmath257 .\ndirect computations show that the energy of the signal @xmath252 is given as follows @xmath260\\ & = & { \bf sc } \left [ \int_{\r^2}\bm{\tilde{f}}_{\bm{\tau}}(x , y)\bm{\tilde{s}}(x , y)dxdy \right]\\ & = & { \bf sc } \left [ \int_{\r^2}p_{\bm{\tau}}(x , y)\bm{\tilde{f}}(x , y)\overline{\bm{\tilde{s}}(x , y)}dxdy\right].\end{aligned}\ ] ] on the other hand , we consider that @xmath261\\ & = & { \bf sc } \left [ \int_{\tilde{\bm{\sigma}}}\f(\bm{\tilde{s}})(u , v ) \overline{\f(\bm{\tilde{f}})(u , v)}dudv \right].\end{aligned}\ ] ] since @xmath262 and @xmath263 \right|^2 \leq \left|\int_{\tilde{\bm{\sigma}}}\f(\bm{\tilde{s}})(u , v ) \overline{\f(\bm{\tilde{f}})(u , v)}dudv\right|^2,\end{aligned}\ ] ] simplifying the above three inequalities , we obtain that @xmath264 from which it follows that @xmath265 we also have the following result for @xmath266 @xmath267\right|^2\\ \nonumber & \leq & \left|\int_{\r^2}p_{\bm{\tau}}(x , y)\bm{\tilde{f}}(x , y)\overline{\bm{\tilde{s}}(x , y)}dxdy\right|^2\\ \nonumber & \leq & \int_{\r^2}p_{\bm{\tau}}(x , y)\bm{\tilde{f}}(x , y)\overline{\bm{\tilde{f}}(x , y)}dxdy \int_{\r^2}p_{\bm{\tau}}(x , y)\bm{\tilde{s}}(x , y)\overline{\bm{\tilde{s}}(x , y)}dxdy.\end{aligned}\ ] ] here , we take @xmath268 into two parts , i.e. , @xmath269 , and use the schwarz inequality for the above inequality .\nclearly , @xmath270 , then @xmath271 summarizing , we have @xmath272 that means , for any @xmath273 , @xmath274 . if @xmath275 , then eq . ( [ eq.1 ] ) and eq .\n( [ eq.2 ] ) must be equalities .\nthis is attained only by setting @xmath276 with @xmath277 .\nit means that @xmath240 is an eigenfunction of eq .\n( [ eq.exist ] ) and @xmath208 is the corresponding eigenvalue , i.e. , @xmath278 . at last , we will show that @xmath250 , and the eigenfunction corresponding to @xmath251 is the function such that @xmath217 is reached . by definition of @xmath279 , there exists a maximum @xmath280 and we denote the maximum @xmath208 as @xmath217 and the corresponding signal as @xmath281 .\nas we have shown , the @xmath217 corresponding the eigenfunction satisfies @xmath276 .\nhere , @xmath282 corresponds to the maximum eigenvalue of @xmath251 .\nhence , @xmath283 . in order to prove that @xmath284 , it suffices to show that @xmath281 is an eigenfunction of the integral equation eq .\n( [ eq.exist ] ) , or equivalently , that with @xmath285 defined as @xmath252 in eq .\n( [ eq.s ] ) with @xmath286 .\nobviously , @xmath287 and @xmath288 , because @xmath217 is maximum by assumption .\nthe proof is complete .\ntheorem [ th.theoremexist ] shows that for arbitrary unit energy @xmath192-bandlimited @xmath25-valued signal associated with qlcts the maximum value of @xmath208 can be achieved by the qpswfs .\nin fact , from the symmetry theorem of fourier theory @xcite , there is also a similar integral equation for time - limited signals , which have the maximum @xmath289 .\nthe prove of this conclusion is similar to theorem [ th.theoremexist ] .\nif the eigenvalues of the integral equation @xmath290 have a maximum @xmath236 , then @xmath289 have a maximum number @xmath291 and @xmath292 .\nthe eigenfunction corresponding to @xmath251 is the function such that @xmath291 are reached .\n( [ eq.fptauf ] ) is equivalent to eq .\n( [ eq.exist ] ) with @xmath293 and @xmath294 . in this section\n, we will discuss the relationship of @xmath295 in eq .\n( [ eq.extremal ] ) from three cases : * @xmath191 is a @xmath192-bandlimited signal associated with qlcts .\n* @xmath191 is a @xmath168-time - limited signal .\n* @xmath191 is an arbitrary signal .\nthe first case follows form the general theory of the @xmath201 in section [ s5.1 ] .\nas we have known @xmath240 is in @xmath244 when @xmath201 , i.e. , @xmath212 . from theorem [ th.theoremexist\n] , we know that the maximum @xmath208 equals the maximum eigenvalue @xmath296 in eq .\n( [ eq.exist ] ) . using the expansion for the @xmath273 , @xmath297 where @xmath298 .\nit is clear that @xmath299 hence , @xmath300 . if @xmath301 , then @xmath302 . if @xmath303 , then we can find a signal @xmath201 whose energy ratio in spatial domain equals @xmath208 , and in this case , @xmath304 is not unique .\nthe second case means @xmath305 . from the property of symmetry of the qlct\nwe conclude that all the properties for signals @xmath201 have corresponding time - limited counterparts . reversing @xmath306 and @xmath193 ,\nwe conclude that @xmath307 . specially , if @xmath308 , then @xmath309 . for the third case ,\nconsidering arbitrary signals with @xmath310 , we aim to find the maximum @xmath311 and the corresponding signal @xmath191 .\nif @xmath300 , as we noted in the case of @xmath201 , we can find @xmath204 with energy ration @xmath208 , hence , @xmath312 .\ntherefore , we only need to consider the case of @xmath313 .\n[ th.maxbeta ] the maximum @xmath291 of @xmath311 must satisfy the following equation @xmath314 where @xmath296 is the largest eigenvalues of eq .\n( [ eq.lowpass ] ) and the corresponding @xmath240 for the maximum @xmath291 is given by @xmath315 before giving the proof to eq .\n( [ eq.maxbeta ] ) , we first need to present the following fact . given a function @xmath240 with spatial projection @xmath316 and frequency projection @xmath317\n, we construct a new function as follows @xmath318 where @xmath319 and @xmath320 are two constants such that the energy of @xmath321 is minimum , where @xmath322 denote @xmath208 , @xmath311 and @xmath323 , @xmath324 the energy ratios for @xmath304 and @xmath325 as eq .\n( [ eq.extremal ] ) , respectively .\nwe conclude that @xmath326 , @xmath327 .\nsuppose the energy of @xmath304 equals to @xmath328 and we rewrite @xmath329 as follows @xmath330 from the orthogonality principle @xcite , it follows that @xmath331 which means @xmath332 .\nmeanwhile , we have @xmath333 of @xmath334 by @xmath335 now we denote two energy for the projection of @xmath321 as follows @xmath336 the @xmath337 and @xmath338 will be simply written as @xmath339 and @xmath340 in the following , respectively . since @xmath341 , we have @xmath342 therefore , @xmath326 and @xmath327 .\nthat means , in order to get the maximum @xmath311 , we can formula a function as follows @xmath343 taking qft to both sides for eq .\n( [ eq.combi ] ) and then taking frequency projection , we have @xmath344 rearranging this formula , we obtain that @xmath345.\end{aligned}\ ] ] taking inverse qft to the above equation , we have @xmath346 on the other hand , taking the spatial projection to eq .\n( [ eq.combi ] ) , we get @xmath347 rearranging this equation , it becomes @xmath348 taking qft on both sides to the above equation , it follows that @xmath349 applying eq .\n( [ eq.inversepafts5 ] ) and eq .\n( [ eq.pafts5 ] ) , we have @xmath350 simplifying the above equality , we obtain that @xmath351 from above equality , we find that @xmath352 is one of qpswfs for eq .\n( [ eq.lowpass ] ) and the corresponding eigenvalue is @xmath353 . by the relationship between @xmath240 and @xmath352 in eq .\n( [ eq.fandf5 ] ) , we conclude that @xmath240 in eq .\n( [ eq.combi ] ) can be rewritten as @xmath354 now , we compute the inner product of the above equation with @xmath240 and @xmath316 respectively . since @xmath355 for @xmath240 , we have @xmath356 then we have @xmath357 and @xmath358 .\nit follows that @xmath359 with @xmath360 and @xmath361 , the parameters become @xmath362 and @xmath363 .\nthat means @xmath364 from which it follows that @xmath365 in order to get the maximal @xmath311 , we must take the largest @xmath366 .\nthe corresponding function is @xmath367 the proof is complete . until now we have discussed all the relationships of @xmath295 , as well as the signals to reach the maximum value of @xmath311 for different conditions of @xmath208 .\n-bandlimited @xmath321 associated with qft and the modulus of @xmath368 in time and qft - frequency domains .\n, width=566 ] -bandlimited @xmath321 associated with qlcts and the modulus of @xmath369 in time and qlct - frequency domains with @xmath370 .\n, width=566 ] now we give some comparison examples to intuitively illustrate the concentration levels of qpswfs associated with qlcts .\nthe widely used gaussian function will be compared with qpswfs . in theorem\n[ th.theoremexist ] , we have shown that qpswfs are the most energy concentred @xmath192-bandlimited signals . now , a @xmath192-bandlimited gaussian function is constructed at first .\nconsider the truncated gaussian function @xmath321 in qlcts - frequency domain as follows @xmath371 where @xmath372 is the qlct of @xmath321 .\nobviously , @xmath372 has unit energy .\nthis @xmath192-bandlimited gaussian function @xmath321 in spatial domain becomes @xmath373 as for the qpswfs , by means of the classical one - dimensional pswfs of zero order we now construct a special qpswf as follows @xmath374 where @xmath375 is the first one - dimensional zero order pswf . here , we construct the qpswf under the condition of @xmath128 .\nthe qlcts for the qpswf becomes @xmath376 for both of the @xmath192-bandlimited signals above , the energy ratios @xmath377 equal to @xmath328 in qlct - frequency domain .\nthe energy ratio pair in spatial and frequency in the comparison is noted as @xmath378 .\n-time - limited @xmath321 associated with qft and the modulus of @xmath368 in time and qft - frequency domains . , width=566 ] -time - limited @xmath321 associated with qlct and the modulus of @xmath368 in time and qlct - frequency domains with @xmath379 . , width=566 ] in fig .\n[ fig.beta1 ] and fig .\n[ fig.beta2 ] , we will show two pairs of the energy ratios @xmath13 for @xmath321 and @xmath368 in spatial domain associated with qlct with two kinds of different parameter matrices . in fig .\n[ fig.beta1 ] we set the parameter matrices of qlct @xmath380 , @xmath381 , which is already a qft . in this case , the energy ratios @xmath13 for @xmath321 and @xmath368 are very close .\nhowever , in fig .\n[ fig.beta2 ] we set the parameter matrices of qlct @xmath382 , @xmath381 . in this case , the energy ratio @xmath13 for @xmath321 is @xmath383 and @xmath13 for @xmath368 is @xmath384 . in fact , we just change the parameters @xmath385 , @xmath381 from @xmath386 to @xmath387 .\nthat means , for qpswfs the energy is more concentred then truncated gaussian function . as for the @xmath168 time - limited function , there are the similar results like @xmath192-bandlimited cases .\nwe also list two pairs of the energy ratios @xmath377 for @xmath321 and @xmath368 in qlct - frequency domains in fig .\n[ fig.xi1 ] and fig .\n[ fig.xi2 ] . in fig .\n[ fig.xi1 ] we also set the parameter matrices of qlct to be the qft .\nthe parameter matrices of qlct in fig .\n[ fig.xi2 ] is the same as that in fig .\n[ fig.beta2 ] . in this two pair cases\n, you may see the energy ratios @xmath377 for @xmath321 and @xmath368 are very close .\nbut one more thing different from fig .\n[ fig.beta1 ] and fig .\n[ fig.beta2 ] is that the energy ratios @xmath377 for @xmath321 and @xmath368 associated with qft is smaller than the energy ratios @xmath377 for @xmath321 and @xmath368 associated with the second parameter matrices .\nthat means , the parameter matrices of qlct is vary important . in some sense , for specific conditions the results for qlct will be better than qft .\nthis paper presented a new generalization of pswfs , namely qpswfs , which are the optimal @xmath25-valued signals for the energy concentration problem associated with the qlcts .\nwe developed the definition of the qpswfs associated with qlcts and established various properties of them . in order to find the energy distribution of @xmath295 for any @xmath25-valued signals , we not only derive the parseval identity associated with ( two - sided ) qlcts , but\nalso show that the maximum @xmath208 for @xmath192-bandlimited signals associated with qlcts in a fixed spatial domain must be qpswfs .\nthe authors acknowledges financial support from the national natural science foundation of china under grant ( no .\n11401606,11501015 ) , university of macau ( no . myrg2015 - 00058-fst and no . myrg099(y1-l2)-fst13-kki ) and the macao science and technology development fund ( no . fdct/094/2011/a and no .\nfdct/099/2012/a3 ) .\n10 url # 1`#1`urlprefixhref # 1#2#2 # 1#1 h. j. landau and h.o . pollak .\n_ prolate spheroidal wave functions , fourier analysis and uncertainty iii : the dimension of space of essentially time - and bandlimited signals _ , bell system technical journal , 41(4 ) , 12951336 ( 1962 ) .\nd. slepian .\n_ prolate spheroidal wave functions , fourier analysis and uncertainty iv : extensions to many dimensions ; generalized prolate spheroidal functions _ , bell system technical journal , 43(6 ) , 30093057 ( 1964 ) .\nj. j. ding and s. c. pei .\n_ reducing sampling error by prolate spheroidal wave functions and fractional fourier transform _ , in proceedings of the ieee international conference on acoustics , speech , and signal processing , 217220 ( 2005 ) .\nj. morais , k. kou , and y. zhang .\n_ generalized prolate spheroidal wave functions for offset linear canonical transform in clifford analysis _ , mathematical methods in the applied sciences , 36(9 ) , 10281041 ( 2013 ) .\ne. b. corrochano , n. trujillo , and m. naranjo .\n_ quaternion fourier descriptors for preprocessing and recognition of spoken words using images of spatiotemporal representations _ , mathematical imaging and vision , 28 , 179190 ( 2007 ) .\np. bas , n. lebihan , and j. m. chassery .\n_ color image water marking using quaternion fourier transform _ , in proceedings of the ieee international conference on acoustics , speech and signal processing , 521524 ( 2003 ) .","quaternionic linear canonical transforms ( qlcts ) are a family of integral transforms , which generalized the quaternionic fourier transform and quaternionic fractional fourier transform . in this paper , we extend the energy concentration problem for 2d hypercomplex signals ( especially quaternionic signals ) . \n the most energy concentrated signals both in 2d spatial and quaternionic linear canonical frequency domains simultaneously are recently recognized to be the quaternionic prolate spheroidal wave functions ( qpswfs ) . \n the improved definitions of qpswfs are studied which gave reasonable properties . \n the purpose of this paper is to understand the measurements of energy concentration in the 2d spatial and quaternionic linear canonical frequency domains . \n examples of energy concentrated ratios between the truncated gaussian function and qpswfs intuitively illustrate that qpswfs are more energy concentrated signals . \n quaternionic linear canonical transforms , energy concentration , quaternionic fourier transform , quaternionic prolate spheroidal wave functions .",introduction\nquaternionic algebra\n the quaternionic linear canonical transforms (qlcts)\nthe quaternionic prolate spheroidal wave functions\nmain results\nconclusion\nacknowledgments\nreferences


We can see that the input data is the `article` - a scientific report and the target data is the `abstract` - a concise summary of the report.

Cool! Having downloaded the dataset, let's tokenize it.
We'll import the convenient `AutoTokenizer` class.

In [None]:
from transformers import AutoTokenizer

 and load the tokenizer

In [None]:
tokenizer = AutoTokenizer.from_pretrained("allenai/led-base-16384")

Downloading (…)okenizer_config.json:   0%|          | 0.00/27.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.09k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Note that for the sake of this notebook, we finetune the "smaller" LED checkpoint ["allenai/led-base-16384"](https://huggingface.co/allenai/led-base-16384). Better performance can however be attained by finetuning ["allenai/led-large-16384"](https://huggingface.co/allenai/led-large-16384) at the cost of a higher required GPU RAM.

Pubmed's input data has a median token length of 2715 with the 90%-ile token length being 6101. The output data has a media token length of 171 with the 90%-ile token length being 352.${}^1$.

Thus, we set the maximum input length to 8192 and the maximum output length to 512 to ensure that the model can attend to almost all input tokens is able to generate up to a large enough number of output tokens.

In this notebook, we are only able to train on `batch_size=2` to prevent out-of-memory errors.

---
${}^1$ The data is taken from page 11 of [Big Bird: Transformers for Longer Sequences](https://arxiv.org/pdf/2007.14062.pdf).


In [None]:
max_input_length = 8192
max_output_length = 512
batch_size = 2

Now, let's write down the input data processing function that will be used to map each data sample to the correct model format.
As explained earlier `article` represents here our input data and `abstract` is the target data. The datasamples are thus tokenized up to the respective maximum lengths of 8192 and 512.

In addition to the usual `attention_mask`, LED can make use of an additional `global_attention_mask` defining which input tokens are attended globally and which are attended only locally, just as it's the case of [Longformer](https://huggingface.co/transformers/model_doc/longformer.html). For more information on Longformer's self-attention, please take a look at the corresponding [docs](https://huggingface.co/transformers/model_doc/longformer.html#longformer-self-attention). For summarization, we follow recommendations of the [paper](https://arxiv.org/abs/2004.05150) and use global attention only for the very first token. Finally, we make sure that no loss is computed on padded tokens by setting their index to `-100`.

In [None]:
def process_data_to_model_inputs(batch):
    # tokenize the inputs and labels
    inputs = tokenizer(
        batch["article"],
        padding="max_length",
        truncation=True,
        max_length=max_input_length,
    )
    outputs = tokenizer(
        batch["abstract"],
        padding="max_length",
        truncation=True,
        max_length=max_output_length,
    )

    batch["input_ids"] = inputs.input_ids
    batch["attention_mask"] = inputs.attention_mask

    # create 0 global_attention_mask lists
    batch["global_attention_mask"] = len(batch["input_ids"]) * [
        [0 for _ in range(len(batch["input_ids"][0]))]
    ]

    # since above lists are references, the following line changes the 0 index for all samples
    batch["global_attention_mask"][0][0] = 1
    batch["labels"] = outputs.input_ids

    # We have to make sure that the PAD token is ignored
    batch["labels"] = [
        [-100 if token == tokenizer.pad_token_id else token for token in labels]
        for labels in batch["labels"]
    ]

    return batch

Great, having defined the mapping function, let's preprocess the training data

In [None]:
train_dataset = train_dataset.map(
    process_data_to_model_inputs,
    batched=True,
    batch_size=batch_size,
    remove_columns=["article", "abstract", "section_names"],
)

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

and validation data

In [None]:
val_dataset = val_dataset.map(
    process_data_to_model_inputs,
    batched=True,
    batch_size=batch_size,
    remove_columns=["article", "abstract", "section_names"],
)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Finally, the datasets should be converted into the PyTorch format as follows.

In [None]:
train_dataset.set_format(
    type="torch",
    columns=["input_ids", "attention_mask", "global_attention_mask", "labels"],
)
val_dataset.set_format(
    type="torch",
    columns=["input_ids", "attention_mask", "global_attention_mask", "labels"],
)

Alright, we're almost ready to start training. Let's load the model via the `AutoModelForSeq2SeqLM` class.

In [None]:
from transformers import AutoModelForSeq2SeqLM

We've decided to stick to the smaller model `"allenai/led-base-16384"` for the sake of this notebook. In addition, we directly enable gradient checkpointing and disable the caching mechanism to save memory.

In [None]:
led = AutoModelForSeq2SeqLM.from_pretrained("allenai/led-base-16384", gradient_checkpointing=True, use_cache=False)

Downloading pytorch_model.bin:   0%|          | 0.00/648M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

During training, we want to evaluate the model on Rouge, the most common metric used in summarization, to make sure the model is indeed improving during training. For this, we set fitting generation parameters. We'll use beam search with a small beam of just 2 to save memory. Also, we force the model to generate at least 100 tokens, but no more than 512. In addition, some other generation parameters are set that have been found helpful for generation. For more information on those parameters, please take a look at the [docs](https://huggingface.co/transformers/main_classes/model.html?highlight=generate#transformers.generation_utils.GenerationMixin.generate).

In [None]:
# set generate hyperparameters
led.config.num_beams = 2
led.config.max_length = 512
led.config.min_length = 100
led.config.length_penalty = 2.0
led.config.early_stopping = True
led.config.no_repeat_ngram_size = 3

Next, we also have to define the function the will compute the `"rouge"` score during evalution.

Let's load the `"rouge"` metric from 🤗datasets and define the `compute_metrics(...)` function.

In [None]:
rouge = load_metric("rouge")

  rouge = load_metric("rouge")


Downloading builder script:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

The compute metrics function expects the generation output, called `pred.predictions` as well as the gold label, called `pred.label_ids`.

Those tokens are decoded and consequently, the rouge score can be computed.

In [None]:
def compute_metrics(pred):
    labels_ids = pred.label_ids
    pred_ids = pred.predictions

    pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
    labels_ids[labels_ids == -100] = tokenizer.pad_token_id
    label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)

    rouge_output = rouge.compute(
        predictions=pred_str, references=label_str, rouge_types=["rouge2"]
    )["rouge2"].mid

    return {
        "rouge2_precision": round(rouge_output.precision, 4),
        "rouge2_recall": round(rouge_output.recall, 4),
        "rouge2_fmeasure": round(rouge_output.fmeasure, 4),
    }

Now, we're ready to start training. Let's import the `Seq2SeqTrainer` and `Seq2SeqTrainingArguments`.

In contrast to the usual `Trainer`, the `Seq2SeqTrainer` makes it possible to use the `generate()` function during evaluation. This should be enabled with `predict_with_generate=True`. Because our GPU RAM is limited, we make use of gradient accumulation by setting `gradient_accumulation_steps=4` to have an effective `batch_size` of 2 * 4 = 8.

Other training arguments can be read upon in the [docs](https://huggingface.co/transformers/main_classes/trainer.html?highlight=trainingarguments#transformers.TrainingArguments).

In [None]:
!pip install accelerate -U



Collecting accelerate
  Downloading accelerate-0.21.0-py3-none-any.whl (244 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/244.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m112.6/244.2 kB[0m [31m3.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate
Successfully installed accelerate-0.21.0


In [None]:
!pip install transformers[torch]



In [None]:
! pip install -U accelerate
! pip install -U transformers



In [None]:
import accelerate
import transformers
from transformers import Seq2SeqTrainingArguments
from transformers import Seq2SeqTrainer


transformers.__version__, accelerate.__version__

('4.31.0', '0.21.0')

In [None]:
! pip install -U accelerate
! pip install -U transformers



In [None]:
# enable fp16 apex training
training_args = Seq2SeqTrainingArguments(
    predict_with_generate=True,
    evaluation_strategy="steps",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    output_dir="./",
    logging_steps=5,
    eval_steps=10,
    save_steps=10,
    save_total_limit=2,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
)

NameError: ignored

In [None]:
trainer = Seq2SeqTrainer(
    model=led,
    tokenizer=tokenizer,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

NameError: ignored

In [None]:
trainer.train()

You're using a LEDTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss,Rouge2 Precision,Rouge2 Recall,Rouge2 Fmeasure
10,3.19,3.133845,0.1301,0.0848,0.0978
20,3.1695,2.928536,0.0887,0.1364,0.1002
30,2.7676,2.813875,0.1426,0.096,0.1096




TrainOutput(global_step=37, training_loss=3.161574634345802, metrics={'train_runtime': 2867.3299, 'train_samples_per_second': 0.105, 'train_steps_per_second': 0.013, 'total_flos': 1598521262211072.0, 'train_loss': 3.161574634345802, 'epoch': 0.99})

In [None]:
import torch

# Save the model's state dictionary
torch.save(led.state_dict(), "/content/drive/project.pt")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
a_test = load_dataset("scientific_papers", "arxiv", ignore_verifications=True, split="test")
arxiv_test = a_test.select(range(5))
def generate_answer(batch):
  inputs_dict = tokenizer(batch["article"], padding="max_length", max_length=8192, return_tensors="pt", truncation=True)
  input_ids = inputs_dict.input_ids.to("cuda")
  attention_mask = inputs_dict.attention_mask.to("cuda")
  global_attention_mask = torch.zeros_like(attention_mask)
  # put global attention on <s> token
  global_attention_mask[:, 0] = 1

  predicted_abstract_ids = led.generate(input_ids, attention_mask=attention_mask, global_attention_mask=global_attention_mask)
  batch["predicted_abstract"] = tokenizer.batch_decode(predicted_abstract_ids, skip_special_tokens=True)
  return batch


result = arxiv_test.map(generate_answer, batched=True, batch_size=1)
# load rouge
rouge = load_metric("rouge")

rouge_types = ["rouge1", "rouge2", "rougeL"]

rouge_scores = rouge.compute(predictions=result["predicted_abstract"], references=result["abstract"], rouge_types=rouge_types)

for rouge_type in rouge_types:
    print(f"Rouge-{rouge_type} scores:")
    print("F-score:", rouge_scores[rouge_type].mid.fmeasure)
    print("Precision:", rouge_scores[rouge_type].mid.precision)
    print("Recall:", rouge_scores[rouge_type].mid.recall)
    print()




Map:   0%|          | 0/5 [00:00<?, ? examples/s]



Rouge-rouge1 scores:
F-score: 0.4323790276846935
Precision: 0.47399344569288393
Recall: 0.4141992220041001

Rouge-rouge2 scores:
F-score: 0.1949194768632888
Precision: 0.21429141418078818
Recall: 0.18324093292129823

Rouge-rougeL scores:
F-score: 0.2668515824458856
Precision: 0.2882063780568407
Recall: 0.2565145764091713



In [None]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m28.5 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m64.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m81.1 MB/s[0m eta [36m0:00:0

In [None]:
import torch
from transformers import LEDForConditionalGeneration, LEDTokenizer

# Load the tokenizer
tokenizer = LEDTokenizer.from_pretrained("allenai/led-base-16384")

# Create the model architecture and move it to the GPU
led = LEDForConditionalGeneration.from_pretrained("allenai/led-base-16384").to("cuda")

# Load the model's state dictionary
led.load_state_dict(torch.load("/content/drive/MyDrive/project.pt", map_location="cuda"))
def generate_output(input_text, max_length=1024, min_length=50):
    input_ids = tokenizer(input_text, max_length=8192, return_tensors="pt", truncation=True).input_ids.to("cuda")
    attention_mask = torch.ones_like(input_ids).to("cuda")
    global_attention_mask = torch.zeros_like(attention_mask)
    # put global attention on <s> token
    global_attention_mask[:, 0] = 1

    predicted_abstract_ids = led.generate(
        input_ids,
        attention_mask=attention_mask,
        global_attention_mask=global_attention_mask,
        max_length=max_length,
        min_length=min_length,
    )
    predicted_abstract = tokenizer.batch_decode(predicted_abstract_ids, skip_special_tokens=True)
    return predicted_abstract[0]


input_text = "Flow matching is a recent framework to train generative models that exhibits impressive empirical performance while being relatively easier to train compared with diffusion-based models. Despite its advantageous properties, prior methods still face the challenges of expensive computing and a large number of function evaluations of off-the-shelf solvers in the pixel space. Furthermore, although latentbased generative methods have shown great success in recent years, this particular model type remains underexplored in this area. In this work, we propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency and scalability for high-resolution image synthesis. This enables flow-matching training on constrained computational resources while maintaining their quality and flexibility. Additionally, our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks, including label-conditioned image generation, image inpainting, and semantic-to-image generation. Through extensive experiments, our approach demonstrates its effectiveness in both quantitative and qualitative results on various datasets, such as CelebA-HQ, FFHQ, LSUN Church & Bedroom, and ImageNet. We also provide a theoretical control of the Wasserstein-2 distance between the reconstructed latent flow distribution and true data distribution, showing it is upper-bounded by the latent flow matching objective."
max_length = 150  # Set the maximum length for the generated output
min_length = 20   # Set the minimum length for the generated output
output_text = generate_output(input_text, max_length=max_length, min_length=min_length)
print("Generated output:")
print(output_text)


Generated output:
Flow matching is a recent framework to train generative models that exhibits impressive empirical performance while being relatively easier to train compared with diffusion-based models. Despite its advantageous properties, prior methods still face the challenges of expensive computing and a large number of function evaluations of off-the-shelf solvers in the pixel space. Furthermore, although latentbased generative methods have shown great success in recent years, this particular model type remains underexplored in this area. In this work, we propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency and scalability for high-resolution image synthesis. This enables flow-matching training on constrained computational resources while maintaining their quality and flexibility


In [None]:
from transformers import pipeline
summarizer = pipeline(task="summarization", model="facebook/bart-large-cnn")

input_text = """
Flow matching is a recent framework to train generative models that exhibits impressive empirical performance while being relatively easier to train compared with diffusion-based models. Despite its advantageous properties, prior methods still face the challenges of expensive computing and a large number of function evaluations of off-the-shelf solvers in the pixel space. Furthermore, although latentbased generative methods have shown great success in recent years, this particular model type remains underexplored in this area. In this work, we propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency and scalability for high-resolution image synthesis. This enables flow-matching training on constrained computational resources while maintaining their quality and flexibility. Additionally, our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks, including label-conditioned image generation, image inpainting, and semantic-to-image generation. Through extensive experiments, our approach demonstrates its effectiveness in both quantitative and qualitative results on various datasets, such as CelebA-HQ, FFHQ, LSUN Church & Bedroom, and ImageNet. We also provide a theoretical control of the Wasserstein-2 distance between the reconstructed latent flow distribution and true data distribution, showing it is upper-bounded by the latent flow matching objective.
"""

summary = summarizer(input_text, max_length=150, min_length=20, do_sample=True)[0]['summary_text']
print(summary)


Flow matching is a framework to train generative models. It offers improved computational efficiency and scalability for high-resolution image synthesis. We propose to apply it in the latent spaces of pretrained autoencoders.


#NOT IMPORTANT

In [None]:
import torch
from transformers import LEDForConditionalGeneration, LEDTokenizer

# Load the tokenizer
tokenizer = LEDTokenizer.from_pretrained("allenai/led-base-16384")

# Create the model architecture and move it to the GPU
led = LEDForConditionalGeneration.from_pretrained("allenai/led-base-16384").to("cuda")

# Load the model's state dictionary
led.load_state_dict(torch.load("/content/drive/MyDrive/project.pt", map_location="cuda"))
def generate_output(input_text, max_length=1024, min_length=50):
    input_ids = tokenizer(input_text, max_length=8192, return_tensors="pt", truncation=True).input_ids.to("cuda")
    attention_mask = torch.ones_like(input_ids).to("cuda")
    global_attention_mask = torch.zeros_like(attention_mask)
    # put global attention on <s> token
    global_attention_mask[:, 0] = 1

    predicted_abstract_ids = led.generate(
        input_ids,
        attention_mask=attention_mask,
        global_attention_mask=global_attention_mask,
        max_length=max_length,
        min_length=min_length,
    )
    predicted_abstract = tokenizer.batch_decode(predicted_abstract_ids, skip_special_tokens=True)
    return predicted_abstract[0]


input_text = "nAs the urgency to reduce greenhouse gas emissions grows, Microgrids(MGs) are emerging as an effective and efficient solution for integrating distributed renewable resources [1]. Compared to centralized power grids, MGs leverage localized generation, minimizing transmission losses and eliminating the need for significant infrastructure adjustments. Another noteworthy advantage of MGs is their ability to enhance grid resilience by operating in “islanded mode,” enabling selfsustainability during outages [2]. Furthermore, MGs play a crucial role in electrifying rural areas, leading to a substantial surge in their numbers [3].\nThe energy management of a MG, involving distributed generation units (both conventional and renewable) and energy storage units, requires making sequential decisions in the face of uncertainties introduced by renewable energy sources. The optimal dispatch for MG energy management under these\nM. V. Liu is with the Field of Systems Engineering, Cornell University, Ithaca, NY, 14853, USA (e-mail: ml2589@cornell.edu)\nP. M. Reed and D. Gold are with Civil and Environmental Engineering, Cornell University, Ithaca, NY, 14853, USA (e-mail: patrick.reed@cornell.edu; dfg42@cornell.edu).\nG. Quist is with the Facilities and Campus Service, Cornell University, Ithaca, NY, 14853, USA (e-mail: gq29@cornell.edu).\nC. L. Anderson is with the Field of Systems Engineering and Cornell Energy Systems Institute, Cornell University, Ithaca, NY, 14853, USA (email: cla28@cornell.edu)\nuncertainties is often addressed by representing the control problem as a Markov Decision Process (MDP) [4], [5]. The MDP represents a sequential decision process in which an agent interacts with an environment, making decisions at each time step to maximize its long-term expected rewards [6]. Traditionally, the MDP control formulations have been tackled using the Dynamic Programming (DP) methods, which are often burdened by the “curse of dimensionality,” limiting the size of the state and action spaces that can be addressed [7]. Moreover, traditional MDP formulations require knowledge of the “model”, which refers to the transition probability and reward function for given state-action pairs for the system of consideration. Unfortunately, in energy systems these transition probabilities are often not known due to the complexity and non-linearity of the systems, prompting the need for a “model-free” approach.\nTo overcome these issues, Reinforcement Learning (RL) has gained popularity in power system operation and control due to its model-free nature [8]. RL, also known as model-free approximate dynamic programming (ADP), has been combined with artificial neural networks (ANNs) for function approximations, leading to the development of Deep RL (DRL) methods [9]. Numerous advancements have been made in employing (D)RL algorithms to enhance demandside energy management [7], [10], [11]. For a comprehensive overview of RL in the context of power system operations, interested readers can refer to [12]. A common practice of these studies is to take advantage of the data-driven property of RL and use historical data to train the agent by simulating its interaction with the environment without oversimplifying the state and action spaces. However, prior studies have primarily focused on a single objective, typically maximizing profit (minimizing cost), due to the inherent nature of the underlying Bellman equations formulation. In fact, methods focused on multiple objectives are exceedingly rare in the RL and ADP bodies of literature [13], [14]. As the share of renewable energy increases in MGs, other objectives, such as environmental impacts, operation reliability, and effective storage operation, are receiving more attention [15]. A sole focus on a single objective could lead to control solutions in the extreme corners of the broader space of relevant performance objectives and fails to properly represent the interests of stakeholders, especially when conflicts exist between their objective functions [16], [17].\nIn addressing multiple objectives in MG optimization, a standard approach is to employ a weighted sum method to convert the original multiple objective formulations into a single objective representation that tacitly infers that the specified\nar X\niv :2\n30 7.\n08 69\n2v 1\n[ ee\nss .S\nY ]\n1 7\nJu l 2\n02 3\n2 weights capture all stakeholders’ preferences. The weighted single objective is the dominant approach for handling multiple objectives in DP formulations [18]. If a user is interested in an explicit representation of the full suite of objective tradeoffs, solving repeated weighted DP instances can become computationally prohibitive. Identifying all of the Pareto optimal solutions1 that compose control tradeoffs becomes increasingly challenging as the number of objectives grows, leading to a factorial growth in the computation cost [19]. In [20]–[23], the weighted sum approach is combined with fuzzy techniques to limit the number of combinations of weights, reducing the complexity of the computation. However, these methods struggle to effectively explore the tradeoffs between the objectives. The predefined preferences for objectives can potentially overlook superior solutions, particularly in scenarios where system dynamics are non-convex and non-linear [24]. This limitation hampers the ability to find optimal solutions that strike the best balance among conflicting objectives.\nMeta-heuristic methods such as the Non-dominated Sorting Genetic Algorithm (NSGA-II) [25] and Multiple Objective Particle Swarm Optimization (MOPSO) [26] have been employed in optimizing MG operation to allow for an explicit search of the Pareto-optimal solutions [27], [28]. These multiobjective optimization algorithms use population sorting techniques to guide search and maintain a set of non-dominated solutions, resulting in a diverse set of Pareto-optimal solutions for decision-makers without specifying predefined preferences for objectives. In addition, as simulation-based optimization, the meta-heuristic methods can handle more complex system formulations that are non-linear and non-convex, making them a good candidate class of solution tools for the demand-side energy management problem [29].\nIn this paper, we present a novel framework that combines the strengths of multi-objective optimization and RL to tackle the energy management problem of MGs. We use the Borg Multi-Objective Evolutionary Algorithm (MOEA) [30], which has been proven to meet or exceed the performance of other MOEAs in complex system planning and operation applications [31]–[33], explicitly exploring their tradeoffs in the higher-dimensional objective spaces. Our framework proposes a model-free policy approximation approach, enabling the agent to interact with the unknown environment in continuous state and action spaces while managing the computation complexity of the stochastic MG control problem of focus. The proposed Multi-Objective Reinforcement Learning (MORL) framework is applied to the challenging abstraction of the Combined Heat and Power (CHP) CU-MG, demonstrating its efficacy in a real-world application. Using historical weather, demand, and utility price data, the trained agent (policy) makes collaborative decisions for multiple energy sources under stochastic state conditions. This model-free policy approximation approach not only reduces computational complexity but also provides interpretability in the trained parametric policy, allowing us to understand how the agent utilizes exogenous information to shape its dynamic and adaptive control actions.\n1Pareto-optimal solution refers to a solution where no objective can be improved without sacrificing another objective.\nThe primary contributions of this study are as follows: • a novel framework that combines the power of multi-\nobjective optimization and RL to support MG energy management under uncertainty, • a data-driven model-free policy approximation that reduces computation complexity while providing interpretability to the policy trained under multiple conflicting objectives, and • demonstrated performance on an existing CHP MG to highlight the tradeoffs between multiple objectives and the importance of exogenous information in improving stochastic decisions relative to current operations.\nThe rest of this paper is organized as follows. Section II introduces the real-world CU-MG and the system modeling formulations. Section III presents the proposed MORL framework and formulates the multi-objective energy management problem under it. Section IV demonstrates the numerical results for the test case. Section V concludes this paper."

max_length = 200  # Set the maximum length for the generated output
min_length = 20   # Set the minimum length for the generated output
output_text = generate_output(input_text, max_length=max_length, min_length=min_length)
print("Generated output:")
print(output_text)


Generated output:
The power system operations of the Combined Heat and Power (CHP) CU-MG are being studied under uncertainty. The problem of MG energy management is often addressed by using a multi-objective optimization approach to solve the problem of multiple objectives. The problem of MG energy management is often addressed by using a model-free policy approximation approach to the problem of multiple objectives. However, the problem of MG energy management is often addressed by using a model-free policy approximation approach to the problem of multiple objectives. The problem of MG energy management is often addressed by using a multi-objective optimization approach to solve the problem of multiple objectives. The problem of MG energy management is often addressed by using a multi-objective optimization approach to solve the problem of multiple objectives [1]. The problem of MG energy management is often addressed by using a multi-objective optimization approach to the problem of 

In [None]:
from transformers import BartTokenizer, BartForConditionalGeneration

model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

def generate_summary(input_text, max_length=512):
    input_text = input_text.strip()
    tokenized_text = tokenizer.encode(input_text, add_special_tokens=True, max_length=max_length, truncation=True, return_tensors="pt")
    summary_ids = model.generate(tokenized_text, num_beams=4, max_length=200, min_length=20, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

input_text = """
    As the urgency to reduce greenhouse gas emissions grows, Microgrids(MGs) are emerging as an effective and efficient solution for integrating distributed renewable resources [1]. Compared to centralized power grids, MGs leverage localized generation, minimizing transmission losses and eliminating the need for significant infrastructure adjustments. Another noteworthy advantage of MGs is their ability to enhance grid resilience by operating in “islanded mode,” enabling selfsustainability during outages [2]. Furthermore, MGs play a crucial role in electrifying rural areas, leading to a substantial surge in their numbers [3].\nThe energy management of a MG, involving distributed generation units (both conventional and renewable) and energy storage units, requires making sequential decisions in the face of uncertainties introduced by renewable energy sources. The optimal dispatch for MG energy management under these\nM. V. Liu is with the Field of Systems Engineering, Cornell University, Ithaca, NY, 14853, USA (e-mail: ml2589@cornell.edu)\nP. M. Reed and D. Gold are with Civil and Environmental Engineering, Cornell University, Ithaca, NY, 14853, USA (e-mail: patrick.reed@cornell.edu; dfg42@cornell.edu).\nG. Quist is with the Facilities and Campus Service, Cornell University, Ithaca, NY, 14853, USA (e-mail: gq29@cornell.edu).\nC. L. Anderson is with the Field of Systems Engineering and Cornell Energy Systems Institute, Cornell University, Ithaca, NY, 14853, USA (email: cla28@cornell.edu)\nuncertainties is often addressed by representing the control problem as a Markov Decision Process (MDP) [4], [5]. The MDP represents a sequential decision process in which an agent interacts with an environment, making decisions at each time step to maximize its long-term expected rewards [6]. Traditionally, the MDP control formulations have been tackled using the Dynamic Programming (DP) methods, which are often burdened by the “curse of dimensionality,” limiting the size of the state and action spaces that can be addressed [7]. Moreover, traditional MDP formulations require knowledge of the “model”, which refers to the transition probability and reward function for given state-action pairs for the system of consideration. Unfortunately, in energy systems these transition probabilities are often not known due to the complexity and non-linearity of the systems, prompting the need for a “model-free” approach.\nTo overcome these issues, Reinforcement Learning (RL) has gained popularity in power system operation and control due to its model-free nature [8]. RL, also known as model-free approximate dynamic programming (ADP), has been combined with artificial neural networks (ANNs) for function approximations, leading to the development of Deep RL (DRL) methods [9]. Numerous advancements have been made in employing (D)RL algorithms to enhance demandside energy management [7], [10], [11]. For a comprehensive overview of RL in the context of power system operations, interested readers can refer to [12]. A common practice of these studies is to take advantage of the data-driven property of RL and use historical data to train the agent by simulating its interaction with the environment without oversimplifying the state and action spaces. However, prior studies have primarily focused on a single objective, typically maximizing profit (minimizing cost), due to the inherent nature of the underlying Bellman equations formulation. In fact, methods focused on multiple objectives are exceedingly rare in the RL and ADP bodies of literature [13], [14]. As the share of renewable energy increases in MGs, other objectives, such as environmental impacts, operation reliability, and effective storage operation, are receiving more attention [15]. A sole focus on a single objective could lead to control solutions in the extreme corners of the broader space of relevant performance objectives and fails to properly represent the interests of stakeholders, especially when conflicts exist between their objective functions [16], [17].\nIn addressing multiple objectives in MG optimization, a standard approach is to employ a weighted sum method to convert the original multiple objective formulations into a single objective representation that tacitly infers that the specified\nar X\niv :2\n30 7.\n08 69\n2v 1\n[ ee\nss .S\nY ]\n1 7\nJu l 2\n02 3\n2 weights capture all stakeholders’ preferences. The weighted single objective is the dominant approach for handling multiple objectives in DP formulations [18]. If a user is interested in an explicit representation of the full suite of objective tradeoffs, solving repeated weighted DP instances can become computationally prohibitive. Identifying all of the Pareto optimal solutions1 that compose control tradeoffs becomes increasingly challenging as the number of objectives grows, leading to a factorial growth in the computation cost [19]. In [20]–[23], the weighted sum approach is combined with fuzzy techniques to limit the number of combinations of weights, reducing the complexity of the computation. However, these methods struggle to effectively explore the tradeoffs between the objectives. The predefined preferences for objectives can potentially overlook superior solutions, particularly in scenarios where system dynamics are non-convex and non-linear [24]. This limitation hampers the ability to find optimal solutions that strike the best balance among conflicting objectives.\nMeta-heuristic methods such as the Non-dominated Sorting Genetic Algorithm (NSGA-II) [25] and Multiple Objective Particle Swarm Optimization (MOPSO) [26] have been employed in optimizing MG operation to allow for an explicit search of the Pareto-optimal solutions [27], [28]. These multiobjective optimization algorithms use population sorting techniques to guide search and maintain a set of non-dominated solutions, resulting in a diverse set of Pareto-optimal solutions for decision-makers without specifying predefined preferences for objectives. In addition, as simulation-based optimization, the meta-heuristic methods can handle more complex system formulations that are non-linear and non-convex, making them a good candidate class of solution tools for the demand-side energy management problem [29].\nIn this paper, we present a novel framework that combines the strengths of multi-objective optimization and RL to tackle the energy management problem of MGs. We use the Borg Multi-Objective Evolutionary Algorithm (MOEA) [30], which has been proven to meet or exceed the performance of other MOEAs in complex system planning and operation applications [31]–[33], explicitly exploring their tradeoffs in the higher-dimensional objective spaces. Our framework proposes a model-free policy approximation approach, enabling the agent to interact with the unknown environment in continuous state and action spaces while managing the computation complexity of the stochastic MG control problem of focus. The proposed Multi-Objective Reinforcement Learning (MORL) framework is applied to the challenging abstraction of the Combined Heat and Power (CHP) CU-MG, demonstrating its efficacy in a real-world application. Using historical weather, demand, and utility price data, the trained agent (policy) makes collaborative decisions for multiple energy sources under stochastic state conditions. This model-free policy approximation approach not only reduces computational complexity but also provides interpretability in the trained parametric policy, allowing us to understand how the agent utilizes exogenous information to shape its dynamic and adaptive control actions.\n1Pareto-optimal solution refers to a solution where no objective can be improved without sacrificing another objective.\nThe primary contributions of this study are as follows: • a novel framework that combines the power of multi-\nobjective optimization and RL to support MG energy management under uncertainty, • a data-driven model-free policy approximation that reduces computation complexity while providing interpretability to the policy trained under multiple conflicting objectives, and • demonstrated performance on an existing CHP MG to highlight the tradeoffs between multiple objectives and the importance of exogenous information in improving stochastic decisions relative to current operations.\nThe rest of this paper is organized as follows. Section II introduces the real-world CU-MG and the system modeling formulations. Section III presents the proposed MORL framework and formulates the multi-objective energy management problem under it. Section IV demonstrates the numerical results for the test case. Section V concludes this paper.
"""

# Split the text into smaller segments and generate summaries for each segment
segment_size = 512
segments = [input_text[i:i+segment_size] for i in range(0, len(input_text), segment_size)]
generated_summaries = [generate_summary(segment) for segment in segments]

# Combine the generated summaries into a final summary
final_summary = " ".join(generated_summaries)
print(final_summary)


As the urgency to reduce greenhouse gas emissions grows, Microgrids(MGs) are emerging as an effective and efficient solution for integrating distributed renewable resources. Compared to centralized power grids, MGs leverage localized generation, minimizing transmission losses and eliminating the need for significant infrastructure adjustments. Another noteworthy advantage of MGs is their ability to enhance grid resilience by operating in “islanded mode” The energy management of a MG, involving distributed generation units (both conventional and renewable) and energy storage units, requires making sequential decisions in the face of uncertainties introduced by renewable energy sources. MGs play a crucial role in electrifying rural areas, leading to a substantial surge in their numbers. (e-mail: ml2589@cornell.edu) P. Reed and D. Gold are with Civil and Environmental Engineering, Cornell University, Ithaca, NY, 14853, USA. G. Quist is with the Facilities and Campus Service. L. Anderson i

## Training

In [None]:
#!/usr/bin/env python3
from datasets import load_dataset, load_metric
from transformers import (
    Seq2SeqTrainer,
    Seq2SeqTrainingArguments,
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
)

# load rouge
rouge = load_metric("rouge")

# load pubmed
pubmed_train = load_dataset("scientific_papers", "pubmed", ignore_verifications=True, split="train")
pubmed_val = load_dataset("scientific_papers", "pubmed", ignore_verifications=True, split="validation[:10%]")

# comment out following lines for a test run
# pubmed_train = pubmed_train.select(range(32))
# pubmed_val = pubmed_val.select(range(32))

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("allenai/led-large-16384")


# max encoder length is 8192 for PubMed
encoder_max_length = 8192
decoder_max_length = 512
batch_size = 2


def process_data_to_model_inputs(batch):
    # tokenize the inputs and labels
    inputs = tokenizer(
        batch["article"],
        padding="max_length",
        truncation=True,
        max_length=encoder_max_length,
    )
    outputs = tokenizer(
        batch["abstract"],
        padding="max_length",
        truncation=True,
        max_length=decoder_max_length,
    )

    batch["input_ids"] = inputs.input_ids
    batch["attention_mask"] = inputs.attention_mask

    # create 0 global_attention_mask lists
    batch["global_attention_mask"] = len(batch["input_ids"]) * [
        [0 for _ in range(len(batch["input_ids"][0]))]
    ]

    # since above lists are references, the following line changes the 0 index for all samples
    batch["global_attention_mask"][0][0] = 1
    batch["labels"] = outputs.input_ids

    # We have to make sure that the PAD token is ignored
    batch["labels"] = [
        [-100 if token == tokenizer.pad_token_id else token for token in labels]
        for labels in batch["labels"]
    ]

    return batch


# map train data
pubmed_train = pubmed_train.map(
    process_data_to_model_inputs,
    batched=True,
    batch_size=batch_size,
    remove_columns=["article", "abstract", "section_names"],
)

# map val data
pubmed_val = pubmed_val.map(
    process_data_to_model_inputs,
    batched=True,
    batch_size=batch_size,
    remove_columns=["article", "abstract", "section_names"],
)

# set Python list to PyTorch tensor
pubmed_train.set_format(
    type="torch",
    columns=["input_ids", "attention_mask", "global_attention_mask", "labels"],
)

# set Python list to PyTorch tensor
pubmed_val.set_format(
    type="torch",
    columns=["input_ids", "attention_mask", "global_attention_mask", "labels"],
)

# enable fp16 apex training
training_args = Seq2SeqTrainingArguments(
    predict_with_generate=True,
    evaluation_strategy="steps",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    fp16_backend="apex",
    output_dir="./",
    logging_steps=250,
    eval_steps=5000,
    save_steps=500,
    warmup_steps=1500,
    save_total_limit=2,
    gradient_accumulation_steps=4,
)


# compute Rouge score during validation
def compute_metrics(pred):
    labels_ids = pred.label_ids
    pred_ids = pred.predictions

    pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
    labels_ids[labels_ids == -100] = tokenizer.pad_token_id
    label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)

    rouge_output = rouge.compute(
        predictions=pred_str, references=label_str, rouge_types=["rouge2"]
    )["rouge2"].mid

    return {
        "rouge2_precision": round(rouge_output.precision, 4),
        "rouge2_recall": round(rouge_output.recall, 4),
        "rouge2_fmeasure": round(rouge_output.fmeasure, 4),
    }


# load model + enable gradient checkpointing & disable cache for checkpointing
led = AutoModelForSeq2SeqLM.from_pretrained("allenai/led-large-16384", gradient_checkpointing=True, use_cache=False)

# set generate hyperparameters
led.config.num_beams = 4
led.config.max_length = 512
led.config.min_length = 100
led.config.length_penalty = 2.0
led.config.early_stopping = True
led.config.no_repeat_ngram_size = 3


# instantiate trainer
trainer = Seq2SeqTrainer(
    model=led,
    tokenizer=tokenizer,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=pubmed_train,
    eval_dataset=pubmed_val,
)

# start training
trainer.train()

## Evaluation

In [None]:
import torch

from datasets import load_dataset, load_metric
from transformers import LEDTokenizer, LEDForConditionalGeneration

# load pubmed
a_test = load_dataset("scientific_papers", "arxiv", ignore_verifications=True, split="test")
arxiv_test = a_dataset.select(range(25))
# load tokenizer
tokenizer = LEDTokenizer.from_pretrained("patrickvonplaten/led-large-16384-pubmed")
model = LEDForConditionalGeneration.from_pretrained("patrickvonplaten/led-large-16384-pubmed").to("cuda").half()


def generate_answer(batch):
  inputs_dict = tokenizer(batch["article"], padding="max_length", max_length=8192, return_tensors="pt", truncation=True)
  input_ids = inputs_dict.input_ids.to("cuda")
  attention_mask = inputs_dict.attention_mask.to("cuda")
  global_attention_mask = torch.zeros_like(attention_mask)
  # put global attention on <s> token
  global_attention_mask[:, 0] = 1

  predicted_abstract_ids = model.generate(input_ids, attention_mask=attention_mask, global_attention_mask=global_attention_mask)
  batch["predicted_abstract"] = tokenizer.batch_decode(predicted_abstract_ids, skip_special_tokens=True)
  return batch


result = pubmed_test.map(generate_answer, batched=True, batch_size=4)

# load rouge
rouge = load_metric("rouge")

print("Result:", rouqge.compute(predictions=result["predicted_abstract"], references=result["abstract"], rouge_types=["rouge2"])["rouge2"].mid)

In [None]:
from transformers import AutoTokenizer
import torch
import itertools
tokenizer = AutoTokenizer.from_pretrained('google/pegasus-wikihow')

for epoch in range(1):
    for batch in train_loader:

            batch['article'] = [tokenizer.encode(text, padding='max_length', truncation=True, max_length=512) for text in batch['article']]

            batch['attention_mask'] = [[1] * len(article) + [0] * (512 - len(article)) for article in batch['article']]

            batch['decoder_input_ids'] = [b.tolist() for b in batch['decoder_input_ids']]
            batch['decoder_attention_mask'] = [b.tolist() for b in batch['decoder_attention_mask']]

            batch['labels'] = [b.tolist() for b in batch['labels']]

            batch = {
                'input_ids': torch.tensor(batch['article']).to(device),
                'attention_mask': torch.tensor(batch['attention_mask']).to(device),
                'decoder_input_ids': torch.tensor(batch['decoder_input_ids']).to(device),
                'decoder_attention_mask': torch.tensor(batch['decoder_attention_mask']).to(device),
                'labels': torch.tensor(batch['labels']).to(device)

               }

            inputs = {
                "input_ids": batch['input_ids'],
                "attention_mask": batch['attention_mask'],
                "decoder_input_ids": batch['decoder_input_ids'],
                "decoder_attention_mask": batch['decoder_attention_mask'],
                "labels": batch['labels'],
            }

            optimizer.zero_grad()

            outputs = model(**inputs)
            loss = outputs.loss
            loss.backward()

            optimizer.step()
            scheduler.step()




In [None]:
model.eval()

generated_summaries = []
target_summaries = []

with torch.no_grad():
    for batch in test_loader:
        inputs = {
            "input_ids": torch.stack(batch["input_ids"]).to(device),
            "attention_mask": torch.stack(batch["attention_mask"]).to(device),
            "decoder_input_ids": torch.stack(batch["decoder_input_ids"]).to(device),
            "decoder_attention_mask": torch.stack(batch["decoder_attention_mask"]).to(device),
            "labels": torch.stack(batch["labels"]).to(device),
        }

        outputs = model.generate(
            input_ids=inputs["decoder_input_ids"],
            attention_mask=inputs["decoder_attention_mask"],
            max_length=128,
            num_beams=4,
            num_return_sequences=1,
            early_stopping=True,
        )
        generated_summaries.extend(tokenizer.batch_decode(outputs, skip_special_tokens=True))
        target_summaries.extend(tokenizer.batch_decode(inputs["labels"], skip_special_tokens=True))

# Print the generated and target summaries
for generated_summary, target_summary in zip(generated_summaries, target_summaries):
    print("Generated Summary:", generated_summary)
    print("Target Summary:", target_summary)
    print()


In [None]:
model.train()
learning_rate = 1e-4
num_epochs = 1

optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

for epoch in range(num_epochs):
    total_loss = 0

    for batch in train_loader:
        inputs = {
            "input_ids": torch.stack(batch["input_ids"]).squeeze(dim=1).cpu(),
            "attention_mask": torch.stack(batch["attention_mask"]).squeeze(dim=1).cpu(),
            "decoder_input_ids": torch.stack(batch["decoder_input_ids"]).squeeze(dim=1).cpu(),
            "decoder_attention_mask": torch.stack(batch["decoder_attention_mask"]).squeeze(dim=1).cpu(),
            "labels": torch.stack(batch["labels"]).squeeze(dim=1).cpu(),
        }

        optimizer.zero_grad()

        inputs = {k: v.to(device) for k, v in inputs.items()}  # Move tensors back to GPU

        outputs = model(**inputs)
        loss = outputs.loss

        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    avg_loss = total_loss / len(train_loader)
    print(f"Epoch {epoch+1}/{num_epochs} - Average Loss: {avg_loss}")
